Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify

author Linus Torvalds <torvalds@linux-foundation.org>

Tue, 10 Aug 2010 18:39:13 +0000 (11:39 -0700)

committer Linus Torvalds <torvalds@linux-foundation.org>

Tue, 10 Aug 2010 18:39:13 +0000 (11:39 -0700)
author Linus Torvalds <torvalds@linux-foundation.org>
Tue, 10 Aug 2010 18:39:13 +0000 (11:39 -0700)
committer Linus Torvalds <torvalds@linux-foundation.org>
Tue, 10 Aug 2010 18:39:13 +0000 (11:39 -0700)
diff --combined Documentation/feature-removal-schedule.txt

index 56cee47,a8188bd..b16cbe4
--- 1/Documentation/feature-removal-schedule.txt
--- 2/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@@ -93,7 -93,7 +93,7 @@@ Why:  Broken design for runtime control 
         inputs.  This framework was never widely used, and most attempts to
         use it were broken.  Drivers should instead be exposing domain-specific
         interfaces either to kernel or to userspace.
- -Who:  Pavel Machek <pavel@suse.cz>
+ +Who:  Pavel Machek <pavel@ucw.cz>
   
   ---------------------------
   
@@@ -116,6 -116,29 +116,6 @@@ Who:      Mauro Carvalho Chehab <mchehab@inf
   
   ---------------------------
   
- -What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl])
- -When: 2.6.35/2.6.36
- -Files:        drivers/pcmcia/: pcmcia_ioctl.c
- -Why:  With the 16-bit PCMCIA subsystem now behaving (almost) like a
- -      normal hotpluggable bus, and with it using the default kernel
- -      infrastructure (hotplug, driver core, sysfs) keeping the PCMCIA
- -      control ioctl needed by cardmgr and cardctl from pcmcia-cs is
- -      unnecessary and potentially harmful (it does not provide for
- -      proper locking), and makes further cleanups and integration of the
- -      PCMCIA subsystem into the Linux kernel device driver model more
- -      difficult. The features provided by cardmgr and cardctl are either
- -      handled by the kernel itself now or are available in the new
- -      pcmciautils package available at
- -      http://kernel.org/pub/linux/utils/kernel/pcmcia/
- -
- -      For all architectures except ARM, the associated config symbol
- -      has been removed from kernel 2.6.34; for ARM, it will be likely
- -      be removed from kernel 2.6.35. The actual code will then likely
- -      be removed from kernel 2.6.36.
- -Who:  Dominik Brodowski <linux@dominikbrodowski.net>
- -
- ----------------------------
- -
   What: sys_sysctl
   When: September 2010
   Option: CONFIG_SYSCTL_SYSCALL
@@@ -151,31 -174,6 +151,31 @@@ Who:     Eric Biederman <ebiederm@xmission.
   
   ---------------------------
   
+ +What: /proc/<pid>/oom_adj
+ +When: August 2012
+ +Why:  /proc/<pid>/oom_adj allows userspace to influence the oom killer's
+ +      badness heuristic used to determine which task to kill when the kernel
+ +      is out of memory.
+ +
+ +      The badness heuristic has since been rewritten since the introduction of
+ +      this tunable such that its meaning is deprecated.  The value was
+ +      implemented as a bitshift on a score generated by the badness()
+ +      function that did not have any precise units of measure.  With the
+ +      rewrite, the score is given as a proportion of available memory to the
+ +      task allocating pages, so using a bitshift which grows the score
+ +      exponentially is, thus, impossible to tune with fine granularity.
+ +
+ +      A much more powerful interface, /proc/<pid>/oom_score_adj, was
+ +      introduced with the oom killer rewrite that allows users to increase or
+ +      decrease the badness() score linearly.  This interface will replace
+ +      /proc/<pid>/oom_adj.
+ +
+ +      A warning will be emitted to the kernel log if an application uses this
+ +      deprecated interface.  After it is printed once, future warnings will be
+ +      suppressed until the kernel is rebooted.
+ +
+ +---------------------------
+ +
   What: remove EXPORT_SYMBOL(kernel_thread)
   When: August 2006
   Files:        arch/*/kernel/*_ksyms.c
@@@ -305,6 -303,15 +305,6 @@@ Who:      Johannes Berg <johannes@sipsolutio
   
   ---------------------------
   
- -What: CONFIG_NF_CT_ACCT
- -When: 2.6.29
- -Why:  Accounting can now be enabled/disabled without kernel recompilation.
- -      Currently used only to set a default value for a feature that is also
- -      controlled by a kernel/module/sysfs/sysctl parameter.
- -Who:  Krzysztof Piotr Oledzki <ole@ans.pl>
- -
- ----------------------------
- -
   What: sysfs ui for changing p4-clockmod parameters
   When: September 2009
   Why:  See commits 129f8ae9b1b5be94517da76009ea956e89104ce8 and
@@@ -360,16 -367,18 +360,8 @@@ When:    2.6.3
   Why:  Should be implemented in userspace, policy daemon.
   Who:  Johannes Berg <johannes@sipsolutions.net>
   
- ---------------------------
- 
- What: CONFIG_INOTIFY
- When: 2.6.33
- Why:  last user (audit) will be converted to the newer more generic
-       and more easily maintained fsnotify subsystem
- Who:  Eric Paris <eparis@redhat.com>
- 
   ----------------------------
   
- -What: lock_policy_rwsem_* and unlock_policy_rwsem_* will not be
- -      exported interface anymore.
- -When: 2.6.33
- -Why:  cpu_policy_rwsem has a new cleaner definition making it local to
- -      cpufreq core and contained inside cpufreq.c. Other dependent
- -      drivers should not use it in order to safely avoid lockdep issues.
- -Who:  Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
- -
- -----------------------------
- -
   What: sound-slot/service-* module aliases and related clutters in
         sound/sound_core.c
   When: August 2010
@@@ -442,6 -451,57 +434,6 @@@ Who:      Corentin Chary <corentin.chary@gma
   
   ----------------------------
   
- -What: usbvideo quickcam_messenger driver
- -When: 2.6.35
- -Files:        drivers/media/video/usbvideo/quickcam_messenger.[ch]
- -Why:  obsolete v4l1 driver replaced by gspca_stv06xx
- -Who:  Hans de Goede <hdegoede@redhat.com>
- -
- -----------------------------
- -
- -What: ov511 v4l1 driver
- -When: 2.6.35
- -Files:        drivers/media/video/ov511.[ch]
- -Why:  obsolete v4l1 driver replaced by gspca_ov519
- -Who:  Hans de Goede <hdegoede@redhat.com>
- -
- -----------------------------
- -
- -What: w9968cf v4l1 driver
- -When: 2.6.35
- -Files:        drivers/media/video/w9968cf*.[ch]
- -Why:  obsolete v4l1 driver replaced by gspca_ov519
- -Who:  Hans de Goede <hdegoede@redhat.com>
- -
- -----------------------------
- -
- -What: ovcamchip sensor framework
- -When: 2.6.35
- -Files:        drivers/media/video/ovcamchip/*
- -Why:  Only used by obsoleted v4l1 drivers
- -Who:  Hans de Goede <hdegoede@redhat.com>
- -
- -----------------------------
- -
- -What: stv680 v4l1 driver
- -When: 2.6.35
- -Files:        drivers/media/video/stv680.[ch]
- -Why:  obsolete v4l1 driver replaced by gspca_stv0680
- -Who:  Hans de Goede <hdegoede@redhat.com>
- -
- -----------------------------
- -
- -What: zc0301 v4l driver
- -When: 2.6.35
- -Files:        drivers/media/video/zc0301/*
- -Why:  Duplicate functionality with the gspca_zc3xx driver, zc0301 only
- -      supports 2 USB-ID's (because it only supports a limited set of
- -      sensors) wich are also supported by the gspca_zc3xx driver
- -      (which supports 53 USB-ID's in total)
- -Who:  Hans de Goede <hdegoede@redhat.com>
- -
- -----------------------------
- -
   What: sysfs-class-rfkill state file
   When: Feb 2014
   Files:        net/rfkill/core.c
@@@ -470,6 -530,37 +462,6 @@@ Who:      Jan Kiszka <jan.kiszka@web.de
   
   ----------------------------
   
- -What: KVM memory aliases support
- -When: July 2010
- -Why:  Memory aliasing support is used for speeding up guest vga access
- -      through the vga windows.
- -
- -      Modern userspace no longer uses this feature, so it's just bitrotted
- -      code and can be removed with no impact.
- -Who:  Avi Kivity <avi@redhat.com>
- -
- -----------------------------
- -
- -What: xtime, wall_to_monotonic
- -When: 2.6.36+
- -Files:        kernel/time/timekeeping.c include/linux/time.h
- -Why:  Cleaning up timekeeping internal values. Please use
- -      existing timekeeping accessor functions to access
- -      the equivalent functionality.
- -Who:  John Stultz <johnstul@us.ibm.com>
- -
- -----------------------------
- -
- -What: KVM kernel-allocated memory slots
- -When: July 2010
- -Why:  Since 2.6.25, kvm supports user-allocated memory slots, which are
- -      much more flexible than kernel-allocated slots.  All current userspace
- -      supports the newer interface and this code can be removed with no
- -      impact.
- -Who:  Avi Kivity <avi@redhat.com>
- -
- -----------------------------
- -
   What: KVM paravirt mmu host support
   When: January 2011
   Why:  The paravirt mmu host support is slower than non-paravirt mmu, both
diff --combined fs/compat.c

index 3e57e81,ce02278..e6d5d70
--- 1/fs/compat.c
--- 2/fs/compat.c
+++ b/fs/compat.c
@@@ -8,14 -8,13 +8,14 @@@
    *  Copyright (C) 1997-2000  Jakub Jelinek  (jakub@redhat.com)
    *  Copyright (C) 1998       Eddie C. Dost  (ecd@skynet.be)
    *  Copyright (C) 2001,2002  Andi Kleen, SuSE Labs 
- - *  Copyright (C) 2003       Pavel Machek (pavel@suse.cz)
+ + *  Copyright (C) 2003       Pavel Machek (pavel@ucw.cz)
    *
    *  This program is free software; you can redistribute it and/or modify
    *  it under the terms of the GNU General Public License version 2 as
    *  published by the Free Software Foundation.
    */
   
+ +#include <linux/stddef.h>
   #include <linux/kernel.h>
   #include <linux/linkage.h>
   #include <linux/compat.h>
@@@ -267,7 -266,7 +267,7 @@@ asmlinkage long compat_sys_statfs(cons
         error = user_path(pathname, &path);
         if (!error) {
                 struct kstatfs tmp;
- -              error = vfs_statfs(path.dentry, &tmp);
+ +              error = vfs_statfs(&path, &tmp);
                 if (!error)
                         error = put_compat_statfs(buf, &tmp);
                 path_put(&path);
@@@ -285,7 -284,7 +285,7 @@@ asmlinkage long compat_sys_fstatfs(unsi
         file = fget(fd);
         if (!file)
                 goto out;
- -      error = vfs_statfs(file->f_path.dentry, &tmp);
+ +      error = vfs_statfs(&file->f_path, &tmp);
         if (!error)
                 error = put_compat_statfs(buf, &tmp);
         fput(file);
@@@ -335,7 -334,7 +335,7 @@@ asmlinkage long compat_sys_statfs64(con
         error = user_path(pathname, &path);
         if (!error) {
                 struct kstatfs tmp;
- -              error = vfs_statfs(path.dentry, &tmp);
+ +              error = vfs_statfs(&path, &tmp);
                 if (!error)
                         error = put_compat_statfs64(buf, &tmp);
                 path_put(&path);
@@@ -356,7 -355,7 +356,7 @@@ asmlinkage long compat_sys_fstatfs64(un
         file = fget(fd);
         if (!file)
                 goto out;
- -      error = vfs_statfs(file->f_path.dentry, &tmp);
+ +      error = vfs_statfs(&file->f_path, &tmp);
         if (!error)
                 error = put_compat_statfs64(buf, &tmp);
         fput(file);
@@@ -379,7 -378,7 +379,7 @@@ asmlinkage long compat_sys_ustat(unsign
         sb = user_get_super(new_decode_dev(dev));
         if (!sb)
                 return -EINVAL;
- -      err = vfs_statfs(sb->s_root, &sbuf);
+ +      err = statfs_by_dentry(sb->s_root, &sbuf);
         drop_super(sb);
         if (err)
                 return err;
@@@ -892,6 -891,8 +892,6 @@@ asmlinkage long compat_sys_mount(char _
         return retval;
   }
   
- -#define NAME_OFFSET(de) ((int) ((de)->d_name - (char __user *) (de)))
- -
   struct compat_old_linux_dirent {
         compat_ulong_t  d_ino;
         compat_ulong_t  d_offset;
@@@ -980,8 -981,7 +980,8 @@@ static int compat_filldir(void *__buf, 
         struct compat_linux_dirent __user * dirent;
         struct compat_getdents_callback *buf = __buf;
         compat_ulong_t d_ino;
- -      int reclen = ALIGN(NAME_OFFSET(dirent) + namlen + 2, sizeof(compat_long_t));
+ +      int reclen = ALIGN(offsetof(struct compat_linux_dirent, d_name) +
+ +              namlen + 2, sizeof(compat_long_t));
   
         buf->error = -EINVAL;   /* only used if we fail.. */
         if (reclen > buf->count)
@@@ -1068,8 -1068,8 +1068,8 @@@ static int compat_filldir64(void * __bu
   {
         struct linux_dirent64 __user *dirent;
         struct compat_getdents_callback64 *buf = __buf;
- -      int jj = NAME_OFFSET(dirent);
- -      int reclen = ALIGN(jj + namlen + 1, sizeof(u64));
+ +      int reclen = ALIGN(offsetof(struct linux_dirent64, d_name) + namlen + 1,
+ +              sizeof(u64));
         u64 off;
   
         buf->error = -EINVAL;   /* only used if we fail.. */
@@@ -1193,11 -1193,10 +1193,10 @@@ out
         if (iov != iovstack)
                 kfree(iov);
         if ((ret + (type == READ)) > 0) {
-               struct dentry *dentry = file->f_path.dentry;
                 if (type == READ)
-                       fsnotify_access(dentry);
+                       fsnotify_access(file);
                 else
-                       fsnotify_modify(dentry);
+                       fsnotify_modify(file);
         }
         return ret;
   }
diff --combined fs/exec.c

index dab85ec,f2de04a..7761837
--- 1/fs/exec.c
--- 2/fs/exec.c
+++ b/fs/exec.c
@@@ -28,6 -28,7 +28,6 @@@
   #include <linux/mm.h>
   #include <linux/stat.h>
   #include <linux/fcntl.h>
- -#include <linux/smp_lock.h>
   #include <linux/swap.h>
   #include <linux/string.h>
   #include <linux/init.h>
@@@ -128,7 -129,7 +128,7 @@@ SYSCALL_DEFINE1(uselib, const char __us
         if (file->f_path.mnt->mnt_flags & MNT_NOEXEC)
                 goto exit;
   
-       fsnotify_open(file->f_path.dentry);
+       fsnotify_open(file);
   
         error = -ENOEXEC;
         if(file->f_op) {
@@@ -652,7 -653,6 +652,7 @@@ int setup_arg_pages(struct linux_binpr
         else
                 stack_base = vma->vm_start - stack_expand;
   #endif
+ +      current->mm->start_stack = bprm->p;
         ret = expand_stack(vma, stack_base);
         if (ret)
                 ret = -EFAULT;
@@@ -683,7 -683,7 +683,7 @@@ struct file *open_exec(const char *name
         if (file->f_path.mnt->mnt_flags & MNT_NOEXEC)
                 goto exit;
   
-       fsnotify_open(file->f_path.dentry);
+       fsnotify_open(file);
   
         err = deny_write_access(file);
         if (err)
@@@ -1891,7 -1891,13 +1891,7 @@@ void do_coredump(long signr, int exit_c
          */
         clear_thread_flag(TIF_SIGPENDING);
   
- -      /*
- -       * lock_kernel() because format_corename() is controlled by sysctl, which
- -       * uses lock_kernel()
- -       */
- -      lock_kernel();
         ispipe = format_corename(corename, signr);
- -      unlock_kernel();
   
         if (ispipe) {
                 int dump_count;
diff --combined fs/inode.c

index 2575244,a2da778..8646433
--- 1/fs/inode.c
--- 2/fs/inode.c
+++ b/fs/inode.c
@@@ -20,7 -20,6 +20,6 @@@
   #include <linux/pagemap.h>
   #include <linux/cdev.h>
   #include <linux/bootmem.h>
- #include <linux/inotify.h>
   #include <linux/fsnotify.h>
   #include <linux/mount.h>
   #include <linux/async.h>
@@@ -264,12 -263,8 +263,8 @@@ void inode_init_once(struct inode *inod
         INIT_RAW_PRIO_TREE_ROOT(&inode->i_data.i_mmap);
         INIT_LIST_HEAD(&inode->i_data.i_mmap_nonlinear);
         i_size_ordered_init(inode);
- #ifdef CONFIG_INOTIFY
-       INIT_LIST_HEAD(&inode->inotify_watches);
-       mutex_init(&inode->inotify_mutex);
- #endif
   #ifdef CONFIG_FSNOTIFY
-       INIT_HLIST_HEAD(&inode->i_fsnotify_mark_entries);
+       INIT_HLIST_HEAD(&inode->i_fsnotify_marks);
   #endif
   }
   EXPORT_SYMBOL(inode_init_once);
@@@ -294,34 -289,32 +289,34 @@@ void __iget(struct inode *inode
         inodes_stat.nr_unused--;
   }
   
- -/**
- - * clear_inode - clear an inode
- - * @inode: inode to clear
- - *
- - * This is called by the filesystem to tell us
- - * that the inode is no longer useful. We just
- - * terminate it with extreme prejudice.
- - */
- -void clear_inode(struct inode *inode)
+ +void end_writeback(struct inode *inode)
   {
         might_sleep();
- -      invalidate_inode_buffers(inode);
- -
         BUG_ON(inode->i_data.nrpages);
+ +      BUG_ON(!list_empty(&inode->i_data.private_list));
         BUG_ON(!(inode->i_state & I_FREEING));
         BUG_ON(inode->i_state & I_CLEAR);
         inode_sync_wait(inode);
- -      if (inode->i_sb->s_op->clear_inode)
- -              inode->i_sb->s_op->clear_inode(inode);
+ +      inode->i_state = I_FREEING | I_CLEAR;
+ +}
+ +EXPORT_SYMBOL(end_writeback);
+ +
+ +static void evict(struct inode *inode)
+ +{
+ +      const struct super_operations *op = inode->i_sb->s_op;
+ +
+ +      if (op->evict_inode) {
+ +              op->evict_inode(inode);
+ +      } else {
+ +              if (inode->i_data.nrpages)
+ +                      truncate_inode_pages(&inode->i_data, 0);
+ +              end_writeback(inode);
+ +      }
         if (S_ISBLK(inode->i_mode) && inode->i_bdev)
                 bd_forget(inode);
         if (S_ISCHR(inode->i_mode) && inode->i_cdev)
                 cd_forget(inode);
- -      inode->i_state = I_CLEAR;
   }
- -EXPORT_SYMBOL(clear_inode);
   
   /*
    * dispose_list - dispose of the contents of a local list
@@@ -340,7 -333,9 +335,7 @@@ static void dispose_list(struct list_he
                 inode = list_first_entry(head, struct inode, i_list);
                 list_del(&inode->i_list);
   
- -              if (inode->i_data.nrpages)
- -                      truncate_inode_pages(&inode->i_data, 0);
- -              clear_inode(inode);
+ +              evict(inode);
   
                 spin_lock(&inode_lock);
                 hlist_del_init(&inode->i_hash);
@@@ -413,7 -408,6 +408,6 @@@ int invalidate_inodes(struct super_bloc
   
         down_write(&iprune_sem);
         spin_lock(&inode_lock);
-       inotify_unmount_inodes(&sb->s_inodes);
         fsnotify_unmount_inodes(&sb->s_inodes);
         busy = invalidate_list(&sb->s_inodes, &throw_away);
         spin_unlock(&inode_lock);
@@@ -553,7 -547,7 +547,7 @@@ repeat
                         continue;
                 if (!test(inode, data))
                         continue;
- -              if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE)) {
+ +              if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
                         __wait_on_freeing_inode(inode);
                         goto repeat;
                 }
@@@ -578,7 -572,7 +572,7 @@@ repeat
                         continue;
                 if (inode->i_sb != sb)
                         continue;
- -              if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE)) {
+ +              if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
                         __wait_on_freeing_inode(inode);
                         goto repeat;
                 }
@@@ -840,7 -834,7 +834,7 @@@ EXPORT_SYMBOL(iunique)
   struct inode *igrab(struct inode *inode)
   {
         spin_lock(&inode_lock);
- -      if (!(inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE)))
+ +      if (!(inode->i_state & (I_FREEING|I_WILL_FREE)))
                 __iget(inode);
         else
                 /*
@@@ -1089,7 -1083,7 +1083,7 @@@ int insert_inode_locked(struct inode *i
                                 continue;
                         if (old->i_sb != sb)
                                 continue;
- -                      if (old->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+ +                      if (old->i_state & (I_FREEING|I_WILL_FREE))
                                 continue;
                         break;
                 }
@@@ -1128,7 -1122,7 +1122,7 @@@ int insert_inode_locked4(struct inode *
                                 continue;
                         if (!test(old, data))
                                 continue;
- -                      if (old->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+ +                      if (old->i_state & (I_FREEING|I_WILL_FREE))
                                 continue;
                         break;
                 }
@@@ -1180,51 -1174,69 +1174,51 @@@ void remove_inode_hash(struct inode *in
   }
   EXPORT_SYMBOL(remove_inode_hash);
   
+ +int generic_delete_inode(struct inode *inode)
+ +{
+ +      return 1;
+ +}
+ +EXPORT_SYMBOL(generic_delete_inode);
+ +
   /*
- - * Tell the filesystem that this inode is no longer of any interest and should
- - * be completely destroyed.
- - *
- - * We leave the inode in the inode hash table until *after* the filesystem's
- - * ->delete_inode completes.  This ensures that an iget (such as nfsd might
- - * instigate) will always find up-to-date information either in the hash or on
- - * disk.
- - *
- - * I_FREEING is set so that no-one will take a new reference to the inode while
- - * it is being deleted.
+ + * Normal UNIX filesystem behaviour: delete the
+ + * inode when the usage count drops to zero, and
+ + * i_nlink is zero.
    */
- -void generic_delete_inode(struct inode *inode)
+ +int generic_drop_inode(struct inode *inode)
   {
- -      const struct super_operations *op = inode->i_sb->s_op;
- -
- -      list_del_init(&inode->i_list);
- -      list_del_init(&inode->i_sb_list);
- -      WARN_ON(inode->i_state & I_NEW);
- -      inode->i_state |= I_FREEING;
- -      inodes_stat.nr_inodes--;
- -      spin_unlock(&inode_lock);
- -
- -      if (op->delete_inode) {
- -              void (*delete)(struct inode *) = op->delete_inode;
- -              /* Filesystems implementing their own
- -               * s_op->delete_inode are required to call
- -               * truncate_inode_pages and clear_inode()
- -               * internally */
- -              delete(inode);
- -      } else {
- -              truncate_inode_pages(&inode->i_data, 0);
- -              clear_inode(inode);
- -      }
- -      spin_lock(&inode_lock);
- -      hlist_del_init(&inode->i_hash);
- -      spin_unlock(&inode_lock);
- -      wake_up_inode(inode);
- -      BUG_ON(inode->i_state != I_CLEAR);
- -      destroy_inode(inode);
+ +      return !inode->i_nlink || hlist_unhashed(&inode->i_hash);
   }
- -EXPORT_SYMBOL(generic_delete_inode);
+ +EXPORT_SYMBOL_GPL(generic_drop_inode);
   
- -/**
- - *    generic_detach_inode - remove inode from inode lists
- - *    @inode: inode to remove
- - *
- - *    Remove inode from inode lists, write it if it's dirty. This is just an
- - *    internal VFS helper exported for hugetlbfs. Do not use!
+ +/*
+ + * Called when we're dropping the last reference
+ + * to an inode.
    *
- - *    Returns 1 if inode should be completely destroyed.
+ + * Call the FS "drop_inode()" function, defaulting to
+ + * the legacy UNIX filesystem behaviour.  If it tells
+ + * us to evict inode, do so.  Otherwise, retain inode
+ + * in cache if fs is alive, sync and evict if fs is
+ + * shutting down.
    */
- -int generic_detach_inode(struct inode *inode)
+ +static void iput_final(struct inode *inode)
   {
         struct super_block *sb = inode->i_sb;
+ +      const struct super_operations *op = inode->i_sb->s_op;
+ +      int drop;
   
- -      if (!hlist_unhashed(&inode->i_hash)) {
+ +      if (op && op->drop_inode)
+ +              drop = op->drop_inode(inode);
+ +      else
+ +              drop = generic_drop_inode(inode);
+ +
+ +      if (!drop) {
                 if (!(inode->i_state & (I_DIRTY|I_SYNC)))
                         list_move(&inode->i_list, &inode_unused);
                 inodes_stat.nr_unused++;
                 if (sb->s_flags & MS_ACTIVE) {
                         spin_unlock(&inode_lock);
- -                      return 0;
+ +                      return;
                 }
                 WARN_ON(inode->i_state & I_NEW);
                 inode->i_state |= I_WILL_FREE;
@@@ -1242,15 -1254,56 +1236,15 @@@
         inode->i_state |= I_FREEING;
         inodes_stat.nr_inodes--;
         spin_unlock(&inode_lock);
- -      return 1;
- -}
- -EXPORT_SYMBOL_GPL(generic_detach_inode);
- -
- -static void generic_forget_inode(struct inode *inode)
- -{
- -      if (!generic_detach_inode(inode))
- -              return;
- -      if (inode->i_data.nrpages)
- -              truncate_inode_pages(&inode->i_data, 0);
- -      clear_inode(inode);
+ +      evict(inode);
+ +      spin_lock(&inode_lock);
+ +      hlist_del_init(&inode->i_hash);
+ +      spin_unlock(&inode_lock);
         wake_up_inode(inode);
+ +      BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
         destroy_inode(inode);
   }
   
- -/*
- - * Normal UNIX filesystem behaviour: delete the
- - * inode when the usage count drops to zero, and
- - * i_nlink is zero.
- - */
- -void generic_drop_inode(struct inode *inode)
- -{
- -      if (!inode->i_nlink)
- -              generic_delete_inode(inode);
- -      else
- -              generic_forget_inode(inode);
- -}
- -EXPORT_SYMBOL_GPL(generic_drop_inode);
- -
- -/*
- - * Called when we're dropping the last reference
- - * to an inode.
- - *
- - * Call the FS "drop()" function, defaulting to
- - * the legacy UNIX filesystem behaviour..
- - *
- - * NOTE! NOTE! NOTE! We're called with the inode lock
- - * held, and the drop function is supposed to release
- - * the lock!
- - */
- -static inline void iput_final(struct inode *inode)
- -{
- -      const struct super_operations *op = inode->i_sb->s_op;
- -      void (*drop)(struct inode *) = generic_drop_inode;
- -
- -      if (op && op->drop_inode)
- -              drop = op->drop_inode;
- -      drop(inode);
- -}
- -
   /**
    *    iput    - put an inode
    *    @inode: inode to put
@@@ -1263,7 -1316,7 +1257,7 @@@
   void iput(struct inode *inode)
   {
         if (inode) {
- -              BUG_ON(inode->i_state == I_CLEAR);
+ +              BUG_ON(inode->i_state & I_CLEAR);
   
                 if (atomic_dec_and_lock(&inode->i_count, &inode_lock))
                         iput_final(inode);
diff --combined fs/namei.c

index 42d2d28,3479b17..13ff4ab
--- 1/fs/namei.c
--- 2/fs/namei.c
+++ b/fs/namei.c
@@@ -282,7 -282,8 +282,7 @@@ int inode_permission(struct inode *inod
         if (retval)
                 return retval;
   
- -      return security_inode_permission(inode,
- -                      mask & (MAY_READ|MAY_WRITE|MAY_EXEC|MAY_APPEND));
+ +      return security_inode_permission(inode, mask);
   }
   
   /**
@@@ -1483,7 -1484,8 +1483,7 @@@ static int handle_truncate(struct path 
          */
         error = locks_verify_locked(inode);
         if (!error)
- -              error = security_path_truncate(path, 0,
- -                                     ATTR_MTIME|ATTR_CTIME|ATTR_OPEN);
+ +              error = security_path_truncate(path);
         if (!error) {
                 error = do_truncate(path->dentry, 0,
                                     ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
@@@ -2633,7 -2635,7 +2633,7 @@@ int vfs_rename(struct inode *old_dir, s
   {
         int error;
         int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
-       const char *old_name;
+       const unsigned char *old_name;
   
         if (old_dentry->d_inode == new_dentry->d_inode)
                 return 0;
diff --combined fs/namespace.c

index 32dcd24,1969d6b..66c4f7e
--- 1/fs/namespace.c
--- 2/fs/namespace.c
+++ b/fs/namespace.c
@@@ -29,6 -29,7 +29,7 @@@
   #include <linux/log2.h>
   #include <linux/idr.h>
   #include <linux/fs_struct.h>
+ #include <linux/fsnotify.h>
   #include <asm/uaccess.h>
   #include <asm/unistd.h>
   #include "pnode.h"
@@@ -150,6 -151,9 +151,9 @@@ struct vfsmount *alloc_vfsmnt(const cha
                 INIT_LIST_HEAD(&mnt->mnt_share);
                 INIT_LIST_HEAD(&mnt->mnt_slave_list);
                 INIT_LIST_HEAD(&mnt->mnt_slave);
+ #ifdef CONFIG_FSNOTIFY
+               INIT_HLIST_HEAD(&mnt->mnt_fsnotify_marks);
+ #endif
   #ifdef CONFIG_SMP
                 mnt->mnt_writers = alloc_percpu(int);
                 if (!mnt->mnt_writers)
@@@ -610,6 -614,7 +614,7 @@@ static inline void __mntput(struct vfsm
          * provides barriers, so count_mnt_writers() below is safe.  AV
          */
         WARN_ON(count_mnt_writers(mnt));
+       fsnotify_vfsmount_delete(mnt);
         dput(mnt->mnt_root);
         free_vfsmnt(mnt);
         deactivate_super(sb);
@@@ -1984,7 -1989,7 +1989,7 @@@ long do_mount(char *dev_name, char *dir
         if (flags & MS_RDONLY)
                 mnt_flags |= MNT_READONLY;
   
- -      flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
+ +      flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
                    MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
                    MS_STRICTATIME);
   
diff --combined fs/nfsd/vfs.c

index 8812f6b,16114a8..96360a8
--- 1/fs/nfsd/vfs.c
--- 2/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@@ -604,7 -604,7 +604,7 @@@ nfsd4_get_nfs4_acl(struct svc_rqst *rqs
         return error;
   }
   
- -#endif /* defined(CONFIG_NFS_V4) */
+ +#endif /* defined(CONFIG_NFSD_V4) */
   
   #ifdef CONFIG_NFSD_V3
   /*
@@@ -903,6 -903,7 +903,6 @@@ nfsd_vfs_read(struct svc_rqst *rqstp, s
                 loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
   {
         struct inode *inode;
- -      struct raparms  *ra;
         mm_segment_t    oldfs;
         __be32          err;
         int             host_err;
@@@ -913,6 -914,12 +913,6 @@@
         if (svc_msnfs(fhp) && !lock_may_read(inode, offset, *count))
                 goto out;
   
- -      /* Get readahead parameters */
- -      ra = nfsd_get_raparms(inode->i_sb->s_dev, inode->i_ino);
- -
- -      if (ra && ra->p_set)
- -              file->f_ra = ra->p_ra;
- -
         if (file->f_op->splice_read && rqstp->rq_splice_ok) {
                 struct splice_desc sd = {
                         .len            = 0,
@@@ -930,11 -937,21 +930,11 @@@
                 set_fs(oldfs);
         }
   
- -      /* Write back readahead params */
- -      if (ra) {
- -              struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
- -              spin_lock(&rab->pb_lock);
- -              ra->p_ra = file->f_ra;
- -              ra->p_set = 1;
- -              ra->p_count--;
- -              spin_unlock(&rab->pb_lock);
- -      }
- -
         if (host_err >= 0) {
                 nfsdstats.io_read += host_err;
                 *count = host_err;
                 err = 0;
-               fsnotify_access(file->f_path.dentry);
+               fsnotify_access(file);
         } else 
                 err = nfserrno(host_err);
   out:
@@@ -1045,7 -1062,7 +1045,7 @@@ nfsd_vfs_write(struct svc_rqst *rqstp, 
                 goto out_nfserr;
         *cnt = host_err;
         nfsdstats.io_write += host_err;
-       fsnotify_modify(file->f_path.dentry);
+       fsnotify_modify(file);
   
         /* clear setuid/setgid flag after write */
         if (inode->i_mode & (S_ISUID | S_ISGID))
@@@ -1069,45 -1086,8 +1069,45 @@@ out
    * on entry. On return, *count contains the number of bytes actually read.
    * N.B. After this call fhp needs an fh_put
    */
+ +__be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ +      loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
+ +{
+ +      struct file *file;
+ +      struct inode *inode;
+ +      struct raparms  *ra;
+ +      __be32 err;
+ +
+ +      err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
+ +      if (err)
+ +              return err;
+ +
+ +      inode = file->f_path.dentry->d_inode;
+ +
+ +      /* Get readahead parameters */
+ +      ra = nfsd_get_raparms(inode->i_sb->s_dev, inode->i_ino);
+ +
+ +      if (ra && ra->p_set)
+ +              file->f_ra = ra->p_ra;
+ +
+ +      err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
+ +
+ +      /* Write back readahead params */
+ +      if (ra) {
+ +              struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
+ +              spin_lock(&rab->pb_lock);
+ +              ra->p_ra = file->f_ra;
+ +              ra->p_set = 1;
+ +              ra->p_count--;
+ +              spin_unlock(&rab->pb_lock);
+ +      }
+ +
+ +      nfsd_close(file);
+ +      return err;
+ +}
+ +
+ +/* As above, but use the provided file descriptor. */
   __be32
- -nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
+ +nfsd_read_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
                 loff_t offset, struct kvec *vec, int vlen,
                 unsigned long *count)
   {
@@@ -1119,8 -1099,13 +1119,8 @@@
                 if (err)
                         goto out;
                 err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
- -      } else {
- -              err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
- -              if (err)
- -                      goto out;
- -              err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
- -              nfsd_close(file);
- -      }
+ +      } else /* Note file may still be NULL in NFSv4 special stateid case: */
+ +              err = nfsd_read(rqstp, fhp, offset, vec, vlen, count);
   out:
         return err;
   }
@@@ -1646,7 -1631,7 +1646,7 @@@ nfsd_link(struct svc_rqst *rqstp, struc
                                 char *name, int len, struct svc_fh *tfhp)
   {
         struct dentry   *ddir, *dnew, *dold;
- -      struct inode    *dirp, *dest;
+ +      struct inode    *dirp;
         __be32          err;
         int             host_err;
   
@@@ -1674,6 -1659,7 +1674,6 @@@
                 goto out_nfserr;
   
         dold = tfhp->fh_dentry;
- -      dest = dold->d_inode;
   
         host_err = mnt_want_write(tfhp->fh_export->ex_path.mnt);
         if (host_err) {
@@@ -2033,14 -2019,8 +2033,14 @@@ out
   __be32
   nfsd_statfs(struct svc_rqst *rqstp, struct svc_fh *fhp, struct kstatfs *stat, int access)
   {
- -      __be32 err = fh_verify(rqstp, fhp, 0, NFSD_MAY_NOP | access);
- -      if (!err && vfs_statfs(fhp->fh_dentry,stat))
+ +      struct path path = {
+ +              .mnt    = fhp->fh_export->ex_path.mnt,
+ +              .dentry = fhp->fh_dentry,
+ +      };
+ +      __be32 err;
+ +
+ +      err = fh_verify(rqstp, fhp, 0, NFSD_MAY_NOP | access);
+ +      if (!err && vfs_statfs(&path, stat))
                 err = nfserr_io;
         return err;
   }
@@@ -2058,6 -2038,7 +2058,6 @@@ nfsd_permission(struct svc_rqst *rqstp
                                         struct dentry *dentry, int acc)
   {
         struct inode    *inode = dentry->d_inode;
- -      struct path     path;
         int             err;
   
         if (acc == NFSD_MAY_NOP)
@@@ -2130,7 -2111,15 +2130,7 @@@
         if (err == -EACCES && S_ISREG(inode->i_mode) &&
             acc == (NFSD_MAY_READ | NFSD_MAY_OWNER_OVERRIDE))
                 err = inode_permission(inode, MAY_EXEC);
- -      if (err)
- -              goto nfsd_out;
   
- -      /* Do integrity (permission) checking now, but defer incrementing
- -       * IMA counts to the actual file open.
- -       */
- -      path.mnt = exp->ex_path.mnt;
- -      path.dentry = dentry;
- -nfsd_out:
         return err? nfserrno(err) : 0;
   }
   
diff --combined fs/notify/inode_mark.c

index 152b83e,37b460f..33297c0
--- 1/fs/notify/inode_mark.c
--- 2/fs/notify/inode_mark.c
+++ b/fs/notify/inode_mark.c
@@@ -16,72 -16,6 +16,6 @@@
    *  the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
    */
   
- /*
-  * fsnotify inode mark locking/lifetime/and refcnting
-  *
-  * REFCNT:
-  * The mark->refcnt tells how many "things" in the kernel currently are
-  * referencing this object.  The object typically will live inside the kernel
-  * with a refcnt of 2, one for each list it is on (i_list, g_list).  Any task
-  * which can find this object holding the appropriete locks, can take a reference
-  * and the object itself is guarenteed to survive until the reference is dropped.
-  *
-  * LOCKING:
-  * There are 3 spinlocks involved with fsnotify inode marks and they MUST
-  * be taken in order as follows:
-  *
-  * entry->lock
-  * group->mark_lock
-  * inode->i_lock
-  *
-  * entry->lock protects 2 things, entry->group and entry->inode.  You must hold
-  * that lock to dereference either of these things (they could be NULL even with
-  * the lock)
-  *
-  * group->mark_lock protects the mark_entries list anchored inside a given group
-  * and each entry is hooked via the g_list.  It also sorta protects the
-  * free_g_list, which when used is anchored by a private list on the stack of the
-  * task which held the group->mark_lock.
-  *
-  * inode->i_lock protects the i_fsnotify_mark_entries list anchored inside a
-  * given inode and each entry is hooked via the i_list. (and sorta the
-  * free_i_list)
-  *
-  *
-  * LIFETIME:
-  * Inode marks survive between when they are added to an inode and when their
-  * refcnt==0.
-  *
-  * The inode mark can be cleared for a number of different reasons including:
-  * - The inode is unlinked for the last time.  (fsnotify_inode_remove)
-  * - The inode is being evicted from cache. (fsnotify_inode_delete)
-  * - The fs the inode is on is unmounted.  (fsnotify_inode_delete/fsnotify_unmount_inodes)
-  * - Something explicitly requests that it be removed.  (fsnotify_destroy_mark_by_entry)
-  * - The fsnotify_group associated with the mark is going away and all such marks
-  *   need to be cleaned up. (fsnotify_clear_marks_by_group)
-  *
-  * Worst case we are given an inode and need to clean up all the marks on that
-  * inode.  We take i_lock and walk the i_fsnotify_mark_entries safely.  For each
-  * mark on the list we take a reference (so the mark can't disappear under us).
-  * We remove that mark form the inode's list of marks and we add this mark to a
-  * private list anchored on the stack using i_free_list;  At this point we no
-  * longer fear anything finding the mark using the inode's list of marks.
-  *
-  * We can safely and locklessly run the private list on the stack of everything
-  * we just unattached from the original inode.  For each mark on the private list
-  * we grab the mark-> and can thus dereference mark->group and mark->inode.  If
-  * we see the group and inode are not NULL we take those locks.  Now holding all
-  * 3 locks we can completely remove the mark from other tasks finding it in the
-  * future.  Remember, 10 things might already be referencing this mark, but they
-  * better be holding a ref.  We drop our reference we took before we unhooked it
-  * from the inode.  When the ref hits 0 we can free the mark.
-  *
-  * Very similarly for freeing by group, except we use free_g_list.
-  *
-  * This has the very interesting property of being able to run concurrently with
-  * any (or all) other directions.
-  */
- 
   #include <linux/fs.h>
   #include <linux/init.h>
   #include <linux/kernel.h>
@@@ -95,30 -29,19 +29,19 @@@
   #include <linux/fsnotify_backend.h>
   #include "fsnotify.h"
   
- void fsnotify_get_mark(struct fsnotify_mark_entry *entry)
- {
-       atomic_inc(&entry->refcnt);
- }
- 
- void fsnotify_put_mark(struct fsnotify_mark_entry *entry)
- {
-       if (atomic_dec_and_test(&entry->refcnt))
-               entry->free_mark(entry);
- }
- 
   /*
    * Recalculate the mask of events relevant to a given inode locked.
    */
   static void fsnotify_recalc_inode_mask_locked(struct inode *inode)
   {
-       struct fsnotify_mark_entry *entry;
+       struct fsnotify_mark *mark;
         struct hlist_node *pos;
         __u32 new_mask = 0;
   
         assert_spin_locked(&inode->i_lock);
   
-       hlist_for_each_entry(entry, pos, &inode->i_fsnotify_mark_entries, i_list)
-               new_mask |= entry->mask;
+       hlist_for_each_entry(mark, pos, &inode->i_fsnotify_marks, i.i_list)
+               new_mask |= mark->mask;
         inode->i_fsnotify_mask = new_mask;
   }
   
@@@ -135,107 -58,26 +58,26 @@@ void fsnotify_recalc_inode_mask(struct 
         __fsnotify_update_child_dentry_flags(inode);
   }
   
- /*
-  * Any time a mark is getting freed we end up here.
-  * The caller had better be holding a reference to this mark so we don't actually
-  * do the final put under the entry->lock
-  */
- void fsnotify_destroy_mark_by_entry(struct fsnotify_mark_entry *entry)
+ void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
   {
-       struct fsnotify_group *group;
-       struct inode *inode;
- 
-       spin_lock(&entry->lock);
- 
-       group = entry->group;
-       inode = entry->inode;
- 
-       BUG_ON(group && !inode);
-       BUG_ON(!group && inode);
- 
-       /* if !group something else already marked this to die */
-       if (!group) {
-               spin_unlock(&entry->lock);
-               return;
-       }
+       struct inode *inode = mark->i.inode;
   
-       /* 1 from caller and 1 for being on i_list/g_list */
-       BUG_ON(atomic_read(&entry->refcnt) < 2);
+       assert_spin_locked(&mark->lock);
+       assert_spin_locked(&mark->group->mark_lock);
   
-       spin_lock(&group->mark_lock);
         spin_lock(&inode->i_lock);
   
-       hlist_del_init(&entry->i_list);
-       entry->inode = NULL;
- 
-       list_del_init(&entry->g_list);
-       entry->group = NULL;
- 
-       fsnotify_put_mark(entry); /* for i_list and g_list */
+       hlist_del_init_rcu(&mark->i.i_list);
+       mark->i.inode = NULL;
   
         /*
-        * this mark is now off the inode->i_fsnotify_mark_entries list and we
+        * this mark is now off the inode->i_fsnotify_marks list and we
          * hold the inode->i_lock, so this is the perfect time to update the
          * inode->i_fsnotify_mask
          */
         fsnotify_recalc_inode_mask_locked(inode);
   
         spin_unlock(&inode->i_lock);
-       spin_unlock(&group->mark_lock);
-       spin_unlock(&entry->lock);
- 
-       /*
-        * Some groups like to know that marks are being freed.  This is a
-        * callback to the group function to let it know that this entry
-        * is being freed.
-        */
-       if (group->ops->freeing_mark)
-               group->ops->freeing_mark(entry, group);
- 
-       /*
-        * __fsnotify_update_child_dentry_flags(inode);
-        *
-        * I really want to call that, but we can't, we have no idea if the inode
-        * still exists the second we drop the entry->lock.
-        *
-        * The next time an event arrive to this inode from one of it's children
-        * __fsnotify_parent will see that the inode doesn't care about it's
-        * children and will update all of these flags then.  So really this
-        * is just a lazy update (and could be a perf win...)
-        */
- 
- 
-       iput(inode);
- 
-       /*
-        * it's possible that this group tried to destroy itself, but this
-        * this mark was simultaneously being freed by inode.  If that's the
-        * case, we finish freeing the group here.
-        */
-       if (unlikely(atomic_dec_and_test(&group->num_marks)))
-               fsnotify_final_destroy_group(group);
- }
- 
- /*
-  * Given a group, destroy all of the marks associated with that group.
-  */
- void fsnotify_clear_marks_by_group(struct fsnotify_group *group)
- {
-       struct fsnotify_mark_entry *lentry, *entry;
-       LIST_HEAD(free_list);
- 
-       spin_lock(&group->mark_lock);
-       list_for_each_entry_safe(entry, lentry, &group->mark_entries, g_list) {
-               list_add(&entry->free_g_list, &free_list);
-               list_del_init(&entry->g_list);
-               fsnotify_get_mark(entry);
-       }
-       spin_unlock(&group->mark_lock);
- 
-       list_for_each_entry_safe(entry, lentry, &free_list, free_g_list) {
-               fsnotify_destroy_mark_by_entry(entry);
-               fsnotify_put_mark(entry);
-       }
   }
   
   /*
@@@ -243,112 -85,145 +85,145 @@@
    */
   void fsnotify_clear_marks_by_inode(struct inode *inode)
   {
-       struct fsnotify_mark_entry *entry, *lentry;
+       struct fsnotify_mark *mark, *lmark;
         struct hlist_node *pos, *n;
         LIST_HEAD(free_list);
   
         spin_lock(&inode->i_lock);
-       hlist_for_each_entry_safe(entry, pos, n, &inode->i_fsnotify_mark_entries, i_list) {
-               list_add(&entry->free_i_list, &free_list);
-               hlist_del_init(&entry->i_list);
-               fsnotify_get_mark(entry);
+       hlist_for_each_entry_safe(mark, pos, n, &inode->i_fsnotify_marks, i.i_list) {
+               list_add(&mark->i.free_i_list, &free_list);
+               hlist_del_init_rcu(&mark->i.i_list);
+               fsnotify_get_mark(mark);
         }
         spin_unlock(&inode->i_lock);
   
-       list_for_each_entry_safe(entry, lentry, &free_list, free_i_list) {
-               fsnotify_destroy_mark_by_entry(entry);
-               fsnotify_put_mark(entry);
+       list_for_each_entry_safe(mark, lmark, &free_list, i.free_i_list) {
+               fsnotify_destroy_mark(mark);
+               fsnotify_put_mark(mark);
         }
   }
   
+ /*
+  * Given a group clear all of the inode marks associated with that group.
+  */
+ void fsnotify_clear_inode_marks_by_group(struct fsnotify_group *group)
+ {
+       fsnotify_clear_marks_by_group_flags(group, FSNOTIFY_MARK_FLAG_INODE);
+ }
+ 
   /*
    * given a group and inode, find the mark associated with that combination.
    * if found take a reference to that mark and return it, else return NULL
    */
- struct fsnotify_mark_entry *fsnotify_find_mark_entry(struct fsnotify_group *group,
-                                                    struct inode *inode)
+ struct fsnotify_mark *fsnotify_find_inode_mark_locked(struct fsnotify_group *group,
+                                                     struct inode *inode)
   {
-       struct fsnotify_mark_entry *entry;
+       struct fsnotify_mark *mark;
         struct hlist_node *pos;
   
         assert_spin_locked(&inode->i_lock);
   
-       hlist_for_each_entry(entry, pos, &inode->i_fsnotify_mark_entries, i_list) {
-               if (entry->group == group) {
-                       fsnotify_get_mark(entry);
-                       return entry;
+       hlist_for_each_entry(mark, pos, &inode->i_fsnotify_marks, i.i_list) {
+               if (mark->group == group) {
+                       fsnotify_get_mark(mark);
+                       return mark;
                 }
         }
         return NULL;
   }
   
   /*
-  * Nothing fancy, just initialize lists and locks and counters.
+  * given a group and inode, find the mark associated with that combination.
+  * if found take a reference to that mark and return it, else return NULL
    */
- void fsnotify_init_mark(struct fsnotify_mark_entry *entry,
-                       void (*free_mark)(struct fsnotify_mark_entry *entry))
+ struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group,
+                                              struct inode *inode)
+ {
+       struct fsnotify_mark *mark;
+ 
+       spin_lock(&inode->i_lock);
+       mark = fsnotify_find_inode_mark_locked(group, inode);
+       spin_unlock(&inode->i_lock);
+ 
+       return mark;
+ }
   
+ /*
+  * If we are setting a mark mask on an inode mark we should pin the inode
+  * in memory.
+  */
+ void fsnotify_set_inode_mark_mask_locked(struct fsnotify_mark *mark,
+                                        __u32 mask)
   {
-       spin_lock_init(&entry->lock);
-       atomic_set(&entry->refcnt, 1);
-       INIT_HLIST_NODE(&entry->i_list);
-       entry->group = NULL;
-       entry->mask = 0;
-       entry->inode = NULL;
-       entry->free_mark = free_mark;
+       struct inode *inode;
+ 
+       assert_spin_locked(&mark->lock);
+ 
+       if (mask &&
+           mark->i.inode &&
+           !(mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED)) {
+               mark->flags |= FSNOTIFY_MARK_FLAG_OBJECT_PINNED;
+               inode = igrab(mark->i.inode);
+               /*
+                * we shouldn't be able to get here if the inode wasn't
+                * already safely held in memory.  But bug in case it
+                * ever is wrong.
+                */
+               BUG_ON(!inode);
+       }
   }
   
   /*
-  * Attach an initialized mark entry to a given group and inode.
+  * Attach an initialized mark to a given inode.
    * These marks may be used for the fsnotify backend to determine which
-  * event types should be delivered to which group and for which inodes.
+  * event types should be delivered to which group and for which inodes.  These
+  * marks are ordered according to the group's location in memory.
    */
- int fsnotify_add_mark(struct fsnotify_mark_entry *entry,
-                     struct fsnotify_group *group, struct inode *inode)
+ int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
+                           struct fsnotify_group *group, struct inode *inode,
+                           int allow_dups)
   {
-       struct fsnotify_mark_entry *lentry;
+       struct fsnotify_mark *lmark;
+       struct hlist_node *node, *last = NULL;
         int ret = 0;
   
-       inode = igrab(inode);
-       if (unlikely(!inode))
-               return -EINVAL;
+       mark->flags |= FSNOTIFY_MARK_FLAG_INODE;
+ 
+       assert_spin_locked(&mark->lock);
+       assert_spin_locked(&group->mark_lock);
   
-       /*
-        * LOCKING ORDER!!!!
-        * entry->lock
-        * group->mark_lock
-        * inode->i_lock
-        */
-       spin_lock(&entry->lock);
-       spin_lock(&group->mark_lock);
         spin_lock(&inode->i_lock);
   
-       lentry = fsnotify_find_mark_entry(group, inode);
-       if (!lentry) {
-               entry->group = group;
-               entry->inode = inode;
+       mark->i.inode = inode;
   
-               hlist_add_head(&entry->i_list, &inode->i_fsnotify_mark_entries);
-               list_add(&entry->g_list, &group->mark_entries);
+       /* is mark the first mark? */
+       if (hlist_empty(&inode->i_fsnotify_marks)) {
+               hlist_add_head_rcu(&mark->i.i_list, &inode->i_fsnotify_marks);
+               goto out;
+       }
   
-               fsnotify_get_mark(entry); /* for i_list and g_list */
+       /* should mark be in the middle of the current list? */
+       hlist_for_each_entry(lmark, node, &inode->i_fsnotify_marks, i.i_list) {
+               last = node;
   
-               atomic_inc(&group->num_marks);
+               if ((lmark->group == group) && !allow_dups) {
+                       ret = -EEXIST;
+                       goto out;
+               }
+ 
+               if (mark->group < lmark->group)
+                       continue;
   
-               fsnotify_recalc_inode_mask_locked(inode);
+               hlist_add_before_rcu(&mark->i.i_list, &lmark->i.i_list);
+               goto out;
         }
   
+       BUG_ON(last == NULL);
+       /* mark should be the last entry.  last is the current last entry */
+       hlist_add_after_rcu(last, &mark->i.i_list);
+ out:
+       fsnotify_recalc_inode_mask_locked(inode);
         spin_unlock(&inode->i_lock);
-       spin_unlock(&group->mark_lock);
-       spin_unlock(&entry->lock);
- 
-       if (lentry) {
-               ret = -EEXIST;
-               iput(inode);
-               fsnotify_put_mark(lentry);
-       } else {
-               __fsnotify_update_child_dentry_flags(inode);
-       }
   
         return ret;
   }
@@@ -369,11 -244,11 +244,11 @@@ void fsnotify_unmount_inodes(struct lis
                 struct inode *need_iput_tmp;
   
                 /*
- -               * We cannot __iget() an inode in state I_CLEAR, I_FREEING,
+ +               * We cannot __iget() an inode in state I_FREEING,
                  * I_WILL_FREE, or I_NEW which is fine because by that point
                  * the inode cannot have any associated watches.
                  */
- -              if (inode->i_state & (I_CLEAR|I_FREEING|I_WILL_FREE|I_NEW))
+ +              if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
                         continue;
   
                 /*
@@@ -397,7 -272,7 +272,7 @@@
                 /* In case the dropping of a reference would nuke next_i. */
                 if ((&next_i->i_sb_list != list) &&
                     atomic_read(&next_i->i_count) &&
- -                  !(next_i->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))) {
+ +                  !(next_i->i_state & (I_FREEING | I_WILL_FREE))) {
                         __iget(next_i);
                         need_iput = next_i;
                 }
diff --combined fs/open.c

index 0d1fa3d,bf08263..b715d06
--- 1/fs/open.c
--- 2/fs/open.c
+++ b/fs/open.c
@@@ -29,6 -29,7 +29,7 @@@
   #include <linux/falloc.h>
   #include <linux/fs_struct.h>
   #include <linux/ima.h>
+ #include <linux/dnotify.h>
   
   #include "internal.h"
   
@@@ -110,7 -111,7 +111,7 @@@ static long do_sys_truncate(const char 
   
         error = locks_verify_truncate(inode, NULL, length);
         if (!error)
- -              error = security_path_truncate(&path, length, 0);
+ +              error = security_path_truncate(&path);
         if (!error)
                 error = do_truncate(path.dentry, length, 0, NULL);
   
@@@ -165,7 -166,8 +166,7 @@@ static long do_sys_ftruncate(unsigned i
   
         error = locks_verify_truncate(inode, file, length);
         if (!error)
- -              error = security_path_truncate(&file->f_path, length,
- -                                             ATTR_MTIME|ATTR_CTIME);
+ +              error = security_path_truncate(&file->f_path);
         if (!error)
                 error = do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, file);
   out_putf:
@@@ -366,7 -368,7 +367,7 @@@ SYSCALL_DEFINE1(chdir, const char __use
         if (error)
                 goto out;
   
- -      error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_ACCESS);
+ +      error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
         if (error)
                 goto dput_and_out;
   
@@@ -395,7 -397,7 +396,7 @@@ SYSCALL_DEFINE1(fchdir, unsigned int, f
         if (!S_ISDIR(inode->i_mode))
                 goto out_putf;
   
- -      error = inode_permission(inode, MAY_EXEC | MAY_ACCESS);
+ +      error = inode_permission(inode, MAY_EXEC | MAY_CHDIR);
         if (!error)
                 set_fs_pwd(current->fs, &file->f_path);
   out_putf:
@@@ -413,7 -415,7 +414,7 @@@ SYSCALL_DEFINE1(chroot, const char __us
         if (error)
                 goto out;
   
- -      error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_ACCESS);
+ +      error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
         if (error)
                 goto dput_and_out;
   
@@@ -887,7 -889,7 +888,7 @@@ long do_sys_open(int dfd, const char __
                                 put_unused_fd(fd);
                                 fd = PTR_ERR(f);
                         } else {
-                               fsnotify_open(f->f_path.dentry);
+                               fsnotify_open(f);
                                 fd_install(fd, f);
                         }
                 }
diff --combined include/linux/Kbuild

index 9aa9bca,d5cca9a..2547daf
--- 1/include/linux/Kbuild
--- 2/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@@ -210,6 -210,7 +210,7 @@@ unifdef-y += ethtool.
   unifdef-y += eventpoll.h
   unifdef-y += signalfd.h
   unifdef-y += ext2_fs.h
+ unifdef-y += fanotify.h
   unifdef-y += fb.h
   unifdef-y += fcntl.h
   unifdef-y += filter.h
@@@ -276,7 -277,6 +277,7 @@@ ifneq ($(wildcard $(srctree)/arch/$(SRC
                   $(srctree)/include/asm-$(SRCARCH)/kvm_para.h),)
   unifdef-y += kvm_para.h
   endif
+ +unifdef-y += l2tp.h
   unifdef-y += llc.h
   unifdef-y += loop.h
   unifdef-y += lp.h
diff --combined include/linux/fs.h

index 9e22101,d92c212..a8ccf85
--- 1/include/linux/fs.h
--- 2/include/linux/fs.h
+++ b/include/linux/fs.h
@@@ -53,7 -53,6 +53,7 @@@ struct inodes_stat_t 
   #define MAY_APPEND 8
   #define MAY_ACCESS 16
   #define MAY_OPEN 32
+ +#define MAY_CHDIR 64
   
   /*
    * flags in file.f_mode.  Note that FMODE_READ and FMODE_WRITE must correspond
@@@ -91,6 -90,9 +91,9 @@@
   /* Expect random access pattern */
   #define FMODE_RANDOM          ((__force fmode_t)0x1000)
   
+ /* File was opened by fanotify and shouldn't generate fanotify events */
+ #define FMODE_NONOTIFY                ((__force fmode_t)16777216) /* 0x1000000 */
+ 
   /*
    * The below are the various read and write types that we support. Some of
    * them include behavioral modifiers that send information down to the
@@@ -210,7 -212,6 +213,7 @@@
   #define MS_KERNMOUNT  (1<<22) /* this is a kern_mount call */
   #define MS_I_VERSION  (1<<23) /* Update inode I_version field */
   #define MS_STRICTATIME        (1<<24) /* Always perform atime updates */
+ +#define MS_BORN               (1<<29)
   #define MS_ACTIVE     (1<<30)
   #define MS_NOUSER     (1<<31)
   
@@@ -409,16 -410,12 +412,13 @@@ extern int get_max_files(void)
   extern int sysctl_nr_open;
   extern struct inodes_stat_t inodes_stat;
   extern int leases_enable, lease_break_time;
- #ifdef CONFIG_DNOTIFY
- extern int dir_notify_enable;
- #endif
   
   struct buffer_head;
   typedef int (get_block_t)(struct inode *inode, sector_t iblock,
                         struct buffer_head *bh_result, int create);
   typedef void (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
- -                      ssize_t bytes, void *private);
+ +                      ssize_t bytes, void *private, int ret,
+ +                      bool is_async);
   
   /*
    * Attribute flags.  These should be or-ed together to figure out what
@@@ -688,7 -685,6 +688,7 @@@ struct block_device 
    */
   #define PAGECACHE_TAG_DIRTY   0
   #define PAGECACHE_TAG_WRITEBACK       1
+ +#define PAGECACHE_TAG_TOWRITE 2
   
   int mapping_tagged(struct address_space *mapping, int tag);
   
@@@ -772,12 -768,7 +772,7 @@@ struct inode 
   
   #ifdef CONFIG_FSNOTIFY
         __u32                   i_fsnotify_mask; /* all events this inode cares about */
-       struct hlist_head       i_fsnotify_mark_entries; /* fsnotify mark entries */
- #endif
- 
- #ifdef CONFIG_INOTIFY
-       struct list_head        inotify_watches; /* watches on this inode */
-       struct mutex            inotify_mutex;  /* protects the watches list */
+       struct hlist_head       i_fsnotify_marks;
   #endif
   
         unsigned long           i_state;
@@@ -1565,8 -1556,8 +1560,8 @@@ struct super_operations 
   
         void (*dirty_inode) (struct inode *);
         int (*write_inode) (struct inode *, struct writeback_control *wbc);
- -      void (*drop_inode) (struct inode *);
- -      void (*delete_inode) (struct inode *);
+ +      int (*drop_inode) (struct inode *);
+ +      void (*evict_inode) (struct inode *);
         void (*put_super) (struct super_block *);
         void (*write_super) (struct super_block *);
         int (*sync_fs)(struct super_block *sb, int wait);
@@@ -1574,6 -1565,7 +1569,6 @@@
         int (*unfreeze_fs) (struct super_block *);
         int (*statfs) (struct dentry *, struct kstatfs *);
         int (*remount_fs) (struct super_block *, int *, char *);
- -      void (*clear_inode) (struct inode *);
         void (*umount_begin) (struct super_block *);
   
         int (*show_options)(struct seq_file *, struct vfsmount *);
@@@ -1618,8 -1610,8 +1613,8 @@@
    * I_FREEING          Set when inode is about to be freed but still has dirty
    *                    pages or buffers attached or the inode itself is still
    *                    dirty.
- - * I_CLEAR            Set by clear_inode().  In this state the inode is clean
- - *                    and can be destroyed.
+ + * I_CLEAR            Added by end_writeback().  In this state the inode is clean
+ + *                    and can be destroyed.  Inode keeps I_FREEING.
    *
    *                    Inodes that are I_WILL_FREE, I_FREEING or I_CLEAR are
    *                    prohibited for many purposes.  iget() must wait for
@@@ -1816,8 -1808,7 +1811,8 @@@ extern struct vfsmount *collect_mounts(
   extern void drop_collected_mounts(struct vfsmount *);
   extern int iterate_mounts(int (*)(struct vfsmount *, void *), void *,
                           struct vfsmount *);
- -extern int vfs_statfs(struct dentry *, struct kstatfs *);
+ +extern int vfs_statfs(struct path *, struct kstatfs *);
+ +extern int statfs_by_dentry(struct dentry *, struct kstatfs *);
   extern int freeze_super(struct super_block *super);
   extern int thaw_super(struct super_block *super);
   
@@@ -2167,8 -2158,9 +2162,8 @@@ extern void iput(struct inode *)
   extern struct inode * igrab(struct inode *);
   extern ino_t iunique(struct super_block *, ino_t);
   extern int inode_needs_sync(struct inode *inode);
- -extern void generic_delete_inode(struct inode *inode);
- -extern void generic_drop_inode(struct inode *inode);
- -extern int generic_detach_inode(struct inode *inode);
+ +extern int generic_delete_inode(struct inode *inode);
+ +extern int generic_drop_inode(struct inode *inode);
   
   extern struct inode *ilookup5_nowait(struct super_block *sb,
                 unsigned long hashval, int (*test)(struct inode *, void *),
@@@ -2185,7 -2177,7 +2180,7 @@@ extern void unlock_new_inode(struct ino
   
   extern void __iget(struct inode * inode);
   extern void iget_failed(struct inode *);
- -extern void clear_inode(struct inode *);
+ +extern void end_writeback(struct inode *);
   extern void destroy_inode(struct inode *);
   extern void __destroy_inode(struct inode *);
   extern struct inode *new_inode(struct super_block *);
@@@ -2271,6 -2263,16 +2266,6 @@@ static inline int xip_truncate_page(str
   struct bio;
   typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode,
                             loff_t file_offset);
- -void dio_end_io(struct bio *bio, int error);
- -
- -ssize_t __blockdev_direct_IO_newtrunc(int rw, struct kiocb *iocb, struct inode *inode,
- -      struct block_device *bdev, const struct iovec *iov, loff_t offset,
- -      unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
- -      dio_submit_t submit_io, int lock_type);
- -ssize_t __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
- -      struct block_device *bdev, const struct iovec *iov, loff_t offset,
- -      unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
- -      dio_submit_t submit_io, int lock_type);
   
   enum {
         /* need locking between buffered and direct access */
@@@ -2280,13 -2282,24 +2275,13 @@@
         DIO_SKIP_HOLES  = 0x02,
   };
   
- -static inline ssize_t blockdev_direct_IO_newtrunc(int rw, struct kiocb *iocb,
- -      struct inode *inode, struct block_device *bdev, const struct iovec *iov,
- -      loff_t offset, unsigned long nr_segs, get_block_t get_block,
- -      dio_iodone_t end_io)
- -{
- -      return __blockdev_direct_IO_newtrunc(rw, iocb, inode, bdev, iov, offset,
- -                                  nr_segs, get_block, end_io, NULL,
- -                                  DIO_LOCKING | DIO_SKIP_HOLES);
- -}
+ +void dio_end_io(struct bio *bio, int error);
+ +
+ +ssize_t __blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
+ +      struct block_device *bdev, const struct iovec *iov, loff_t offset,
+ +      unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
+ +      dio_submit_t submit_io, int flags);
   
- -static inline ssize_t blockdev_direct_IO_no_locking_newtrunc(int rw, struct kiocb *iocb,
- -      struct inode *inode, struct block_device *bdev, const struct iovec *iov,
- -      loff_t offset, unsigned long nr_segs, get_block_t get_block,
- -      dio_iodone_t end_io)
- -{
- -      return __blockdev_direct_IO_newtrunc(rw, iocb, inode, bdev, iov, offset,
- -                              nr_segs, get_block, end_io, NULL, 0);
- -}
   static inline ssize_t blockdev_direct_IO(int rw, struct kiocb *iocb,
         struct inode *inode, struct block_device *bdev, const struct iovec *iov,
         loff_t offset, unsigned long nr_segs, get_block_t get_block,
@@@ -2296,6 -2309,15 +2291,6 @@@
                                     nr_segs, get_block, end_io, NULL,
                                     DIO_LOCKING | DIO_SKIP_HOLES);
   }
- -
- -static inline ssize_t blockdev_direct_IO_no_locking(int rw, struct kiocb *iocb,
- -      struct inode *inode, struct block_device *bdev, const struct iovec *iov,
- -      loff_t offset, unsigned long nr_segs, get_block_t get_block,
- -      dio_iodone_t end_io)
- -{
- -      return __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset,
- -                                  nr_segs, get_block, end_io, NULL, 0);
- -}
   #endif
   
   extern const struct file_operations generic_ro_fops;
@@@ -2357,6 -2379,7 +2352,6 @@@ extern int simple_link(struct dentry *
   extern int simple_unlink(struct inode *, struct dentry *);
   extern int simple_rmdir(struct inode *, struct dentry *);
   extern int simple_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
- -extern int simple_setsize(struct inode *, loff_t);
   extern int noop_fsync(struct file *, int);
   extern int simple_empty(struct dentry *);
   extern int simple_readpage(struct file *file, struct page *page);
@@@ -2393,7 -2416,8 +2388,7 @@@ extern int buffer_migrate_page(struct a
   
   extern int inode_change_ok(const struct inode *, struct iattr *);
   extern int inode_newsize_ok(const struct inode *, loff_t offset);
- -extern int __must_check inode_setattr(struct inode *, const struct iattr *);
- -extern void generic_setattr(struct inode *inode, const struct iattr *attr);
+ +extern void setattr_copy(struct inode *inode, const struct iattr *attr);
   
   extern void file_update_time(struct file *file);
   
@@@ -2484,7 -2508,8 +2479,8 @@@ int proc_nr_files(struct ctl_table *tab
   int __init get_filesystem_list(char *buf);
   
   #define ACC_MODE(x) ("\004\002\006\006"[(x)&O_ACCMODE])
- #define OPEN_FMODE(flag) ((__force fmode_t)((flag + 1) & O_ACCMODE))
+ #define OPEN_FMODE(flag) ((__force fmode_t)(((flag + 1) & O_ACCMODE) | \
+                                           (flag & FMODE_NONOTIFY)))
   
   #endif /* __KERNEL__ */
   #endif /* _LINUX_FS_H */
diff --combined include/linux/security.h

index 723a93d,24fc295..5bcb395
--- 1/include/linux/security.h
--- 2/include/linux/security.h
+++ b/include/linux/security.h
@@@ -23,6 -23,7 +23,7 @@@
   #define __LINUX_SECURITY_H
   
   #include <linux/fs.h>
+ #include <linux/fsnotify.h>
   #include <linux/binfmts.h>
   #include <linux/signal.h>
   #include <linux/resource.h>
@@@ -470,6 -471,8 +471,6 @@@ static inline void security_free_mnt_op
    * @path_truncate:
    *    Check permission before truncating a file.
    *    @path contains the path structure for the file.
- - *    @length is the new length of the file.
- - *    @time_attrs is the flags passed to do_truncate().
    *    Return 0 if permission is granted.
    * @inode_getattr:
    *    Check permission before obtaining file attributes.
@@@ -1410,7 -1413,8 +1411,7 @@@ struct security_operations 
         int (*path_rmdir) (struct path *dir, struct dentry *dentry);
         int (*path_mknod) (struct path *dir, struct dentry *dentry, int mode,
                            unsigned int dev);
- -      int (*path_truncate) (struct path *path, loff_t length,
- -                            unsigned int time_attrs);
+ +      int (*path_truncate) (struct path *path);
         int (*path_symlink) (struct path *dir, struct dentry *dentry,
                              const char *old_name);
         int (*path_link) (struct dentry *old_dentry, struct path *new_dir,
@@@ -2803,7 -2807,8 +2804,7 @@@ int security_path_mkdir(struct path *di
   int security_path_rmdir(struct path *dir, struct dentry *dentry);
   int security_path_mknod(struct path *dir, struct dentry *dentry, int mode,
                         unsigned int dev);
- -int security_path_truncate(struct path *path, loff_t length,
- -                         unsigned int time_attrs);
+ +int security_path_truncate(struct path *path);
   int security_path_symlink(struct path *dir, struct dentry *dentry,
                           const char *old_name);
   int security_path_link(struct dentry *old_dentry, struct path *new_dir,
@@@ -2837,7 -2842,8 +2838,7 @@@ static inline int security_path_mknod(s
         return 0;
   }
   
- -static inline int security_path_truncate(struct path *path, loff_t length,
- -                                       unsigned int time_attrs)
+ +static inline int security_path_truncate(struct path *path)
   {
         return 0;
   }
diff --combined include/linux/syscalls.h

index a6bfd13,0ec26a7..2ab198a
--- 1/include/linux/syscalls.h
--- 2/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@@ -167,6 -167,7 +167,6 @@@ extern struct trace_event_functions exi
                 .enter_event    = &event_enter_##sname,         \
                 .exit_event     = &event_exit_##sname,          \
                 .enter_fields   = LIST_HEAD_INIT(__syscall_meta_##sname.enter_fields), \
- -              .exit_fields    = LIST_HEAD_INIT(__syscall_meta_##sname.exit_fields), \
         };
   
   #define SYSCALL_DEFINE0(sname)                                        \
@@@ -181,6 -182,7 +181,6 @@@
                 .enter_event    = &event_enter__##sname,        \
                 .exit_event     = &event_exit__##sname,         \
                 .enter_fields   = LIST_HEAD_INIT(__syscall_meta__##sname.enter_fields), \
- -              .exit_fields    = LIST_HEAD_INIT(__syscall_meta__##sname.exit_fields), \
         };                                                      \
         asmlinkage long sys_##sname(void)
   #else
@@@ -811,6 -813,10 +811,10 @@@ asmlinkage long sys_pselect6(int, fd_se
   asmlinkage long sys_ppoll(struct pollfd __user *, unsigned int,
                           struct timespec __user *, const sigset_t __user *,
                           size_t);
+ asmlinkage long sys_fanotify_init(unsigned int flags, unsigned int event_f_flags);
+ asmlinkage long sys_fanotify_mark(int fanotify_fd, unsigned int flags,
+                                 u64 mask, int fd,
+                                 const char  __user *pathname);
   
   int kernel_execve(const char *filename, char *const argv[], char *const envp[]);
   
diff --combined init/Kconfig

index cb64c58,05e932e..24932b9
--- 1/init/Kconfig
--- 2/init/Kconfig
+++ b/init/Kconfig
@@@ -320,13 -320,17 +320,17 @@@ config AUDITSYSCAL
         help
           Enable low-overhead system-call auditing infrastructure that
           can be used independently or with another kernel subsystem,
-         such as SELinux.  To use audit's filesystem watch feature, please
-         ensure that INOTIFY is configured.
+         such as SELinux.
+ 
+ config AUDIT_WATCH
+       def_bool y
+       depends on AUDITSYSCALL
+       select FSNOTIFY
   
   config AUDIT_TREE
         def_bool y
         depends on AUDITSYSCALL
-       select INOTIFY
+       select FSNOTIFY
   
   menu "RCU Subsystem"
   
@@@ -1143,6 -1147,30 +1147,6 @@@ config TRACEPOINT
   
   source "arch/Kconfig"
   
- -config SLOW_WORK
- -      default n
- -      bool
- -      help
- -        The slow work thread pool provides a number of dynamically allocated
- -        threads that can be used by the kernel to perform operations that
- -        take a relatively long time.
- -
- -        An example of this would be CacheFiles doing a path lookup followed
- -        by a series of mkdirs and a create call, all of which have to touch
- -        disk.
- -
- -        See Documentation/slow-work.txt.
- -
- -config SLOW_WORK_DEBUG
- -      bool "Slow work debugging through debugfs"
- -      default n
- -      depends on SLOW_WORK && DEBUG_FS
- -      help
- -        Display the contents of the slow work run queue through debugfs,
- -        including items currently executing.
- -
- -        See Documentation/slow-work.txt.
- -
   endmenu               # General setup
   
   config HAVE_GENERIC_DMA_COHERENT
diff --combined kernel/Makefile

index c53e491,202df4e..0b72d1a
--- 1/kernel/Makefile
--- 2/kernel/Makefile
+++ b/kernel/Makefile
@@@ -70,14 -70,15 +70,15 @@@ obj-$(CONFIG_IKCONFIG) += configs.
   obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o
   obj-$(CONFIG_SMP) += stop_machine.o
   obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
- obj-$(CONFIG_AUDIT) += audit.o auditfilter.o audit_watch.o
+ obj-$(CONFIG_AUDIT) += audit.o auditfilter.o
   obj-$(CONFIG_AUDITSYSCALL) += auditsc.o
- obj-$(CONFIG_GCOV_KERNEL) += gcov/
+ obj-$(CONFIG_AUDIT_WATCH) += audit_watch.o
   obj-$(CONFIG_AUDIT_TREE) += audit_tree.o
+ obj-$(CONFIG_GCOV_KERNEL) += gcov/
   obj-$(CONFIG_KPROBES) += kprobes.o
   obj-$(CONFIG_KGDB) += debug/
- -obj-$(CONFIG_DETECT_SOFTLOCKUP) += softlockup.o
   obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
+ +obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
   obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
   obj-$(CONFIG_SECCOMP) += seccomp.o
   obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
@@@ -99,6 -100,8 +100,6 @@@ obj-$(CONFIG_TRACING) += trace
   obj-$(CONFIG_X86_DS) += trace/
   obj-$(CONFIG_RING_BUFFER) += trace/
   obj-$(CONFIG_SMP) += sched_cpupri.o
- -obj-$(CONFIG_SLOW_WORK) += slow-work.o
- -obj-$(CONFIG_SLOW_WORK_DEBUG) += slow-work-debugfs.o
   obj-$(CONFIG_PERF_EVENTS) += perf_event.o
   obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
   obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
diff --combined kernel/audit.c

index 8296aa5,05a32f0..d960457
--- 1/kernel/audit.c
--- 2/kernel/audit.c
+++ b/kernel/audit.c
@@@ -56,7 -56,6 +56,6 @@@
   #include <net/netlink.h>
   #include <linux/skbuff.h>
   #include <linux/netlink.h>
- #include <linux/inotify.h>
   #include <linux/freezer.h>
   #include <linux/tty.h>
   
@@@ -407,7 -406,7 +406,7 @@@ static void kauditd_send_skb(struct sk_
                 audit_hold_skb(skb);
         } else
                 /* drop the extra reference if sent ok */
- -              kfree_skb(skb);
+ +              consume_skb(skb);
   }
   
   static int kauditd_thread(void *dummy)
diff --combined kernel/sysctl.c

index 6b005e4,fe30db7..ca38e8e
--- 1/kernel/sysctl.c
--- 2/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@@ -44,16 -44,17 +44,17 @@@
   #include <linux/times.h>
   #include <linux/limits.h>
   #include <linux/dcache.h>
+ #include <linux/dnotify.h>
   #include <linux/syscalls.h>
   #include <linux/vmstat.h>
   #include <linux/nfs_fs.h>
   #include <linux/acpi.h>
   #include <linux/reboot.h>
   #include <linux/ftrace.h>
- -#include <linux/slow-work.h>
   #include <linux/perf_event.h>
   #include <linux/kprobes.h>
   #include <linux/pipe_fs_i.h>
+ +#include <linux/oom.h>
   
   #include <asm/uaccess.h>
   #include <asm/processor.h>
@@@ -76,16 -77,15 +77,16 @@@
   #include <scsi/sg.h>
   #endif
   
+ +#ifdef CONFIG_LOCKUP_DETECTOR
+ +#include <linux/nmi.h>
+ +#endif
+ +
   
   #if defined(CONFIG_SYSCTL)
   
   /* External variables not in a header file. */
   extern int sysctl_overcommit_memory;
   extern int sysctl_overcommit_ratio;
- -extern int sysctl_panic_on_oom;
- -extern int sysctl_oom_kill_allocating_task;
- -extern int sysctl_oom_dump_tasks;
   extern int max_threads;
   extern int core_uses_pid;
   extern int suid_dumpable;
@@@ -107,7 -107,7 +108,7 @@@ extern int blk_iopoll_enabled
   #endif
   
   /* Constants used for minimum and  maximum */
- -#ifdef CONFIG_DETECT_SOFTLOCKUP
+ +#ifdef CONFIG_LOCKUP_DETECTOR
   static int sixty = 60;
   static int neg_one = -1;
   #endif
@@@ -131,6 -131,9 +132,9 @@@ static int min_percpu_pagelist_fract = 
   
   static int ngroups_max = NGROUPS_MAX;
   
+ #ifdef CONFIG_INOTIFY_USER
+ #include <linux/inotify.h>
+ #endif
   #ifdef CONFIG_SPARC
   #include <asm/system.h>
   #endif
@@@ -207,9 -210,6 +211,6 @@@ static struct ctl_table fs_table[]
   static struct ctl_table debug_table[];
   static struct ctl_table dev_table[];
   extern struct ctl_table random_table[];
- #ifdef CONFIG_INOTIFY_USER
- extern struct ctl_table inotify_table[];
- #endif
   #ifdef CONFIG_EPOLL
   extern struct ctl_table epoll_table[];
   #endif
@@@ -563,7 -563,7 +564,7 @@@ static struct ctl_table kern_table[] = 
                 .extra2         = &one,
         },
   #endif
- -#if defined(CONFIG_HOTPLUG) && defined(CONFIG_NET)
+ +#ifdef CONFIG_HOTPLUG
         {
                 .procname       = "hotplug",
                 .data           = &uevent_helper,
@@@ -711,34 -711,7 +712,34 @@@
                 .mode           = 0444,
                 .proc_handler   = proc_dointvec,
         },
- -#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
+ +#if defined(CONFIG_LOCKUP_DETECTOR)
+ +      {
+ +              .procname       = "watchdog",
+ +              .data           = &watchdog_enabled,
+ +              .maxlen         = sizeof (int),
+ +              .mode           = 0644,
+ +              .proc_handler   = proc_dowatchdog_enabled,
+ +      },
+ +      {
+ +              .procname       = "watchdog_thresh",
+ +              .data           = &softlockup_thresh,
+ +              .maxlen         = sizeof(int),
+ +              .mode           = 0644,
+ +              .proc_handler   = proc_dowatchdog_thresh,
+ +              .extra1         = &neg_one,
+ +              .extra2         = &sixty,
+ +      },
+ +      {
+ +              .procname       = "softlockup_panic",
+ +              .data           = &softlockup_panic,
+ +              .maxlen         = sizeof(int),
+ +              .mode           = 0644,
+ +              .proc_handler   = proc_dointvec_minmax,
+ +              .extra1         = &zero,
+ +              .extra2         = &one,
+ +      },
+ +#endif
+ +#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) && !defined(CONFIG_LOCKUP_DETECTOR)
         {
                 .procname       = "unknown_nmi_panic",
                 .data           = &unknown_nmi_panic,
@@@ -841,6 -814,26 +842,6 @@@
                 .proc_handler   = proc_dointvec,
         },
   #endif
- -#ifdef CONFIG_DETECT_SOFTLOCKUP
- -      {
- -              .procname       = "softlockup_panic",
- -              .data           = &softlockup_panic,
- -              .maxlen         = sizeof(int),
- -              .mode           = 0644,
- -              .proc_handler   = proc_dointvec_minmax,
- -              .extra1         = &zero,
- -              .extra2         = &one,
- -      },
- -      {
- -              .procname       = "softlockup_thresh",
- -              .data           = &softlockup_thresh,
- -              .maxlen         = sizeof(int),
- -              .mode           = 0644,
- -              .proc_handler   = proc_dosoftlockup_thresh,
- -              .extra1         = &neg_one,
- -              .extra2         = &sixty,
- -      },
- -#endif
   #ifdef CONFIG_DETECT_HUNG_TASK
         {
                 .procname       = "hung_task_panic",
@@@ -914,6 -907,13 +915,6 @@@
                 .proc_handler   = proc_dointvec,
         },
   #endif
- -#ifdef CONFIG_SLOW_WORK
- -      {
- -              .procname       = "slow-work",
- -              .mode           = 0555,
- -              .child          = slow_work_sysctls,
- -      },
- -#endif
   #ifdef CONFIG_PERF_EVENTS
         {
                 .procname       = "perf_event_paranoid",
diff --combined security/security.c

index e8c87b8,f6ac27c..7461b1b
--- 1/security/security.c
--- 2/security/security.c
+++ b/security/security.c
@@@ -417,11 -417,12 +417,11 @@@ int security_path_rename(struct path *o
                                          new_dentry);
   }
   
- -int security_path_truncate(struct path *path, loff_t length,
- -                         unsigned int time_attrs)
+ +int security_path_truncate(struct path *path)
   {
         if (unlikely(IS_PRIVATE(path->dentry->d_inode)))
                 return 0;
- -      return security_ops->path_truncate(path, length, time_attrs);
+ +      return security_ops->path_truncate(path);
   }
   
   int security_path_chmod(struct dentry *dentry, struct vfsmount *mnt,
@@@ -619,7 -620,13 +619,13 @@@ void security_inode_getsecid(const stru
   
   int security_file_permission(struct file *file, int mask)
   {
-       return security_ops->file_permission(file, mask);
+       int ret;
+ 
+       ret = security_ops->file_permission(file, mask);
+       if (ret)
+               return ret;
+ 
+       return fsnotify_perm(file, mask);
   }
   
   int security_file_alloc(struct file *file)
@@@ -683,7 -690,13 +689,13 @@@ int security_file_receive(struct file *
   
   int security_dentry_open(struct file *file, const struct cred *cred)
   {
-       return security_ops->dentry_open(file, cred);
+       int ret;
+ 
+       ret = security_ops->dentry_open(file, cred);
+       if (ret)
+               return ret;
+ 
+       return fsnotify_perm(file, MAY_OPEN);
   }
   
   int security_task_create(unsigned long clone_flags)
author	Linus Torvalds <torvalds@linux-foundation.org>
	Tue, 10 Aug 2010 18:39:13 +0000 (11:39 -0700)
committer	Linus Torvalds <torvalds@linux-foundation.org>
	Tue, 10 Aug 2010 18:39:13 +0000 (11:39 -0700)
		1	2
Documentation/feature-removal-schedule.txt	patch \|	diff1 \|	diff2 \|	blob \| history
fs/compat.c	patch \|	diff1 \|	diff2 \|	blob \| history
fs/exec.c	patch \|	diff1 \|	diff2 \|	blob \| history
fs/inode.c	patch \|	diff1 \|	diff2 \|	blob \| history
fs/namei.c	patch \|	diff1 \|	diff2 \|	blob \| history
fs/namespace.c	patch \|	diff1 \|	diff2 \|	blob \| history
fs/nfsd/vfs.c	patch \|	diff1 \|	diff2 \|	blob \| history
fs/notify/inode_mark.c	patch \|	diff1 \|	diff2 \|	blob \| history
fs/open.c	patch \|	diff1 \|	diff2 \|	blob \| history
include/linux/Kbuild	patch \|	diff1 \|	diff2 \|	blob \| history
include/linux/fs.h	patch \|	diff1 \|	diff2 \|	blob \| history
include/linux/security.h	patch \|	diff1 \|	diff2 \|	blob \| history
include/linux/syscalls.h	patch \|	diff1 \|	diff2 \|	blob \| history
init/Kconfig	patch \|	diff1 \|	diff2 \|	blob \| history
kernel/Makefile	patch \|	diff1 \|	diff2 \|	blob \| history
kernel/audit.c	patch \|	diff1 \|	diff2 \|	blob \| history
kernel/sysctl.c	patch \|	diff1 \|	diff2 \|	blob \| history
security/security.c	patch \|	diff1 \|	diff2 \|	blob \| history