linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-03-25	ceph: fix security xattr deadlock	Yan, Zheng	8	-11/+125
	When security is enabled, security module can call filesystem's getxattr/setxattr callbacks during d_instantiate(). For cephfs, d_instantiate() is usually called by MDS' dispatch thread, while handling MDS reply. If the MDS reply does not include xattrs and corresponding caps, getxattr/setxattr need to send a new request to MDS and waits for the reply. This makes MDS' dispatch sleep, nobody handles later MDS replies. The fix is make sure lookup/atomic_open reply include xattrs and corresponding caps. So getxattr can be handled by cached xattrs. This requires some modification to both MDS and request message. (Client tells MDS what caps it wants; MDS encodes proper caps in the reply) Smack security module may call setxattr during d_instantiate(). Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL to us. So just make setxattr return error when called by MDS' dispatch thread. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	ceph: don't request vxattrs from MDS	Yan, Zheng	1	-2/+4
	It's uselese because MDS reply does not carry any vxattr. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	ceph: fix mounting same fs multiple times	Yan, Zheng	1	-18/+15
	Now __ceph_open_session() only accepts closed client. An opened client will tigger BUG_ON(). Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	ceph: remove unnecessary NULL check	Yan, Zheng	1	-2/+2
	If page->mapping is NULL, releasepage() callback does not get called. Remove the unnecessary NULL check to make static code analysis tool happy Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	ceph: avoid updating directory inode's i_size accidentally	Yan, Zheng	1	-0/+4
	Directory inode's i_size is used by readdir cache. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	ceph: fix race during filling readdir cache	Yan, Zheng	1	-2/+7
	Readdir cache uses page cache to save dentry pointers. When adding dentry pointers to middle of a page, we need to make sure the page already exists. Otherwise the beginning part of the page will be invalid pointers. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	libceph: use sizeof_footer() more	Ilya Dryomov	1	-16/+3
	Don't open-code sizeof_footer() in read_partial_message() and ceph_msg_revoke(). Also, after switching to sizeof_footer(), it's now possible to use con_out_kvec_add() in prepare_write_message_footer(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2016-03-25	ceph: kill ceph_empty_snapc	Ilya Dryomov	4	-34/+6
	ceph_empty_snapc->num_snaps == 0 at all times. Passing such a snapc to ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only for sizing the request message. Further, in all four cases the subsequent ceph_osdc_build_request() is passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps and making ceph_empty_snapc entirely useless. The two cases where it actually mattered were removed in commits 860560904962 ("ceph: avoid sending unnessesary FLUSHSNAP message") and 23078637e054 ("ceph: fix queuing inode to mdsdir's snaprealm"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	ceph: fix a wrong comparison	Anton Protopopov	1	-1/+1
	A negative value rc compared to the positive value ENOENT in the finish_read() function. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	ceph: replace CURRENT_TIME by current_fs_time()	Deepa Dinamani	4	-6/+6
	CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	ceph: scattered page writeback	Yan, Zheng	1	-109/+196
	This patch makes ceph_writepages_start() try using single OSD request to write all dirty pages within a strip unit. When a nonconsecutive dirty page is found, ceph_writepages_start() tries starting a new write operation to existing OSD request. If it succeeds, it uses the new operation to writeback the dirty page. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	libceph: add helper that duplicates last extent operation	Yan, Zheng	2	-0/+24
	This helper duplicates last extent operation in OSD request, then adjusts the new extent operation's offset and length. The helper is for scatterd page writeback, which adds nonconsecutive dirty pages to single OSD request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: enable large, variable-sized OSD requests	Ilya Dryomov	3	-19/+32
	Turn r_ops into a flexible array member to enable large, consisting of up to 16 ops, OSD requests. The use case is scattered writeback in cephfs and, as far as the kernel client is concerned, 16 is just a made up number. r_ops had size 3 for copyup+hint+write, but copyup is really a special case - it can only happen once. ceph_osd_request_cache is therefore stuffed with num_ops=2 requests, anything bigger than that is allocated with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which means either num_ops=1 or num_ops=2 for use_mempool=true - all existing users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with that. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: osdc->req_mempool should be backed by a slab pool	Ilya Dryomov	1	-2/+2
	ceph_osd_request_cache was introduced a long time ago. Also, osd_req is about to get a flexible array member, which ceph_osd_request_cache is going to be aware of. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: make r_request msg_size calculation clearer	Ilya Dryomov	1	-10/+11
	Although msg_size is calculated correctly, the terms are grouped in a misleading way - snaps appears to not have room for a u32 length. Move calculation closer to its use and regroup terms. No functional change. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: move r_reply_op_{len,result} into struct ceph_osd_req_op	Yan, Zheng	3	-5/+6
	This avoids defining large array of r_reply_op_{len,result} in in struct ceph_osd_request. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: rename ceph_osd_req_op::payload_len to indata_len	Ilya Dryomov	2	-7/+7
	Follow userspace nomenclature on this - the next commit adds outdata_len. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	ceph: remove useless BUG_ON	Yan, Zheng	1	-2/+0
	ceph_osdc_start_request() never return -EOLDSNAP Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	ceph: don't enable rbytes mount option by default	Yan, Zheng	2	-4/+3
	When rbytes mount option is enabled, directory size is recursive size. Recursive size is not updated instantly. This can cause directory size to change between successive stat(1) Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	ceph: encode ctime in cap message	Yan, Zheng	1	-4/+7
	Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25	libceph: behave in mon_fault() if cur_mon < 0	Ilya Dryomov	1	-14/+9
	This can happen if __close_session() in ceph_monc_stop() races with a connection reset. We need to ignore such faults, otherwise it's likely we would take !hunting, call __schedule_delayed() and end up with delayed_work() executing on invalid memory, among other things. The (two!) con->private tests are useless, as nothing ever clears con->private. Nuke them. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: reschedule tick in mon_fault()	Ilya Dryomov	1	-4/+4
	Doing __schedule_delayed() in the hunting branch is pointless, as the tick will have already been scheduled by then. What we need to do instead is reschedule it in the !hunting branch, after reopen_session() changes hunt_mult, which affects the delay. This helps with spacing out connection attempts and avoiding things like two back-to-back attempts followed by a longer period of waiting around. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: introduce and switch to reopen_session()	Ilya Dryomov	1	-17/+16
	hunting is now set in __open_session() and cleared in finish_hunting(), instead of all around. The "session lost" message is printed not only on connection resets, but also on keepalive timeouts. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: monc hunt rate is 3s with backoff up to 30s	Ilya Dryomov	3	-9/+22
	Unless we are in the process of setting up a client (i.e. connecting to the monitor cluster for the first time), apply a backoff: every time we want to reopen a session, increase our timeout by a multiple (currently 2); when we complete the connection, reduce that multipler by 50%. Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: monc ping rate is 10s	Ilya Dryomov	3	-9/+5
	Split ping interval and ping timeout: ping interval is 10s; keepalive timeout is 30s. Make monc_ping_timeout a constant while at it - it's not actually exported as a mount option (and the rest of tick-related settings won't be either), so it's got no place in ceph_options. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: pick a different monitor when reconnecting	Ilya Dryomov	1	-28/+57
	Don't try to reconnect to the same monitor when we fail to establish a session within a timeout or it's lost. For that, pick_new_mon() needs to see the old value of cur_mon, so don't clear it in __close_session() - all calls to __close_session() but one are followed by __open_session() anyway. __open_session() is only called when a new session needs to be established, so the "already open?" branch, which is now in the way, is simply dropped. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: revamp subs code, switch to SUBSCRIBE2 protocol	Ilya Dryomov	8	-95/+174
	It is currently hard-coded in the mon_client that mdsmap and monmap subs are continuous, while osdmap sub is always "onetime". To better handle full clusters/pools in the osd_client, we need to be able to issue continuous osdmap subs. Revamp subs code to allow us to specify for each sub whether it should be continuous or not. Although not strictly required for the above, switch to SUBSCRIBE2 protocol while at it, eliminating the ambiguity between a request for "every map since X" and a request for "just the latest" when we don't have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now required - it's been supported since pre-argonaut (2010). Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling in before we validate the epoch and successfully install the new map can mess up mon_client sub state. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: decouple hunting and subs management	Ilya Dryomov	1	-9/+22
	Coupling hunting state with subscribe state is not a good idea. Clear hunting when we complete the authentication handshake. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25	libceph: move debugfs initialization into __ceph_open_session()	Ilya Dryomov	2	-51/+4
	Our debugfs dir name is a concatenation of cluster fsid and client unique ID ("global_id"). It used to be the case that we learned global_id first, nowadays we always learn fsid first - the monmap is sent before any auth replies are. ceph_debugfs_client_init() call in ceph_monc_handle_map() is therefore never executed and can be removed. Its counterpart in handle_auth_reply() doesn't really belong there either: having to do monc->client and unlocking early to work around lockdep is a testament to that. Move it into __ceph_open_session(), where it can be called unconditionally. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-18	bonding: fix bond_get_stats()	Eric Dumazet	2	-31/+36
	bond_get_stats() can be called from rtnetlink (with RTNL held) or from /proc/net/dev seq handler (with RCU held) The logic added in commit 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") kind of assumed only one cpu could run there. If multiple threads are reading /proc/net/dev, stats can be really messed up after a while. A second problem is that some fields are 32bit, so we need to properly handle the wrap around problem. Given that RTNL is not always held, we need to use bond_for_each_slave_rcu(). Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	net: bcmgenet: fix dma api length mismatch	Eric Dumazet	1	-2/+2
	When un-mapping skb->data in __bcmgenet_tx_reclaim(), we must use the length that was used in original dma_map_single(), instead of skb->len that might be bigger (includes the frags) We simply can store skb_len into tx_cb_ptr->dma_len and use it at unmap time. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	net/mlx4_core: Fix backward compatibility on VFs	Eli Cohen	1	-6/+18
	Commit 85743f1eb345 ("net/mlx4_core: Set UAR page size to 4KB regardless of system page size") introduced dependency where old VF drivers without this fix fail to load if the PF driver runs with this commit. To resolve this add a module parameter which disables that functionality by default. If both the PF and VFs are running with a driver with that commit the administrator may set the module param to true. The module parameter is called enable_4k_uar. Fixes: 85743f1eb345 ('net/mlx4_core: Set UAR page size to 4KB ...') Signed-off-by: Eli Cohen <eli@mellanox.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	phy: mdio-thunder: Fix some Kconfig typos	Andreas Färber	1	-2/+2
	Drop two extra occurrences of "on" in option title and help text. Fixes: 379d7ac7ca31 ("phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses.") Cc: David Daney <david.daney@cavium.com> Signed-off-by: Andreas Färber <afaerber@suse.de> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	lan78xx: add ndo_get_stats64	Woojung Huh	1	-0/+49
	Add lan78xx_get_stats64 of ndo_get_stats64 to report all statistics counters including errors from HW statistics. Read from HW when auto suspend is disabled, use saved counter when auto suspend is enabled because periodic call to ndo_get_stats64 prevents USB auto suspend. Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	lan78xx: handle statistics counter rollover	Woojung Huh	1	-13/+239
	Update to handle statistics counter rollover. Check statistics counter periodically and compensate it when counter value rolls over at max (20 or 32bits). Simple mechanism adjusts monitoring timer to allow USB auto suspend. Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	RDS: TCP: Remove unused constant	Sowmini Varadhan	1	-2/+0
	RDS_TCP_DEFAULT_BUFSIZE has been unused since commit 1edd6a14d24f ("RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune"). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket	Sowmini Varadhan	1	-10/+135
	Add per-net sysctl tunables to set the size of sndbuf and rcvbuf on the kernel tcp socket. The tunables are added at /proc/sys/net/rds/tcp/rds_tcp_sndbuf and /proc/sys/net/rds/tcp/rds_tcp_rcvbuf. These values must be set before accept() or connect(), and there may be an arbitrary number of existing rds-tcp sockets when the tunable is modified. To make sure that all connections in the netns pick up the same value for the tunable, we reset existing rds-tcp connections in the netns, so that they can reconnect with the new parameters. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	net: smc911x: convert pxa dma to dmaengine	Robert Jarzmik	2	-66/+82
	Convert the dma transfers to be dmaengine based, now pxa has a dmaengine slave driver. This makes this driver a bit more PXA agnostic. The driver was only compile tested. The risk is quite small as no current PXA platform I'm aware of is using smc911x driver. Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Tested-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	team: remove duplicate set of flag IFF_MULTICAST	Zhang Shengju	1	-1/+0
	Remove unnecessary set of flag IFF_MULTICAST, since ether_setup already does this. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	bonding: remove duplicate set of flag IFF_MULTICAST	Zhang Shengju	1	-1/+1
	Remove unnecessary set of flag IFF_MULTICAST, since ether_setup already does this. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	net: fix a comment typo	Zhang Shengju	1	-1/+1
	Fix a comment typo. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	ethernet: micrel: fix some error codes	Dan Carpenter	1	-4/+6
	There were two issues here: 1) dma_mapping_error() return true/false but we want to return -ENOMEM 2) If dmaengine_prep_slave_sg() failed then "err" wasn't set but presumably that should be -ENOMEM as well. I changed the success path to "return 0;" instead of "return ret;" for clarity. Fixes: 94fe8c683cea ('ks8842: Support DMA when accessed via timberdale') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it	Daniel Borkmann	4	-8/+16
	eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX for this, which makes the code a bit more generic and allows to remove BPF_TUNLEN_MAX from eBPF code. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	bpf, dst: add and use dst_tclassid helper	Daniel Borkmann	2	-8/+13
	We can just add a small helper dst_tclassid() for retrieving the dst->tclassid value. It makes the code a bit better in that we can get rid of the ifdef from filter.c by moving this into the header. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	bpf: make skb->tc_classid also readable	Daniel Borkmann	1	-6/+6
	Currently, the tc_classid from eBPF skb context is write-only, but there's no good reason for tc programs to limit it to write-only. For example, it can be used to transfer its state via tail calls where the resulting tc_classid gets filled gradually. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	net: mvneta: bm: clarify dependencies	Arnd Bergmann	1	-2/+10
	MVNETA_BM has a dependency on MVNETA, so we can only select the former if the latter is enabled. However, the code dependency is the reverse: The mvneta module can call into the mvneta_bm module, so mvneta cannot be a built-in if mvneta_bm is a module, or we get a link error: drivers/net/built-in.o: In function `mvneta_remove': drivers/net/ethernet/marvell/mvneta.c:4211: undefined reference to `mvneta_bm_pool_destroy' drivers/net/built-in.o: In function `mvneta_bm_update_mtu': drivers/net/ethernet/marvell/mvneta.c:1034: undefined reference to `mvneta_bm_bufs_free' This avoids the problem by further clarifying the dependency so that MVNETA_BM is a silent Kconfig option that gets turned on by the new MVNETA_BM_ENABLE option. This way both the core HWBM module and the MVNETA_BM code are always built-in when needed. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: dc35a10f68d3 ("net: mvneta: bm: add support for hardware buffer management") Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	cls_bpf: reset class and reuse major in da	Daniel Borkmann	1	-5/+8
	There are two issues with the current code. First one is that we need to set res->class to 0 in case we use non-default classid matching. This is important for the case where cls_bpf was initially set up with an optional binding to a default class with tcf_bind_filter(), where the underlying qdisc implements bind_tcf() that fills res->class and tests for it later on when doing the classification. Convention for these cases is that after tc_classify() was called, such qdiscs (atm, drr, qfq, cbq, hfsc, htb) first test class, and if 0, then they lookup based on classid. Second, there's a bug with da mode, where res->classid is only assigned a 16 bit minor, but it needs to expand to the full 32 bit major/minor combination instead, therefore we need to expand with the bound major. This is fine as classes belonging to a classful qdisc must share the same major. Fixes: 045efa82ff56 ("cls_bpf: introduce integrated actions") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c	Aaron Young	2	-50/+56
	Checkpatch updates for sunvnet.c and sunvnet_common.c. Signed-off-by: Aaron Young <aaron.young@oracle.com> Signed-off-by: Rashmi Narasimhan <rashmi.narasimhan@oracle.com> Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	ldmvsw: Add ldmvsw.c driver code	Aaron Young	4	-0/+481
	Add ldmvsw.c driver Details: The ldmvsw driver very closely follows the sunvnet.c code and makes use of the sunvnet_common.c code for core functionality. A significant difference between sunvnet and ldmvsw driver is sunvnet creates a network interface for each vnet-port parent node in the MD while the ldmvsw driver creates a network interface for every vsw-port node in the Machine Description (MD). Therefore the netdev_priv() for sunvnet is a vnet structure while the netdev_priv() for ldmvsw is a vnet_port structure. Vnet_port structures allocated by ldmvsw have the vsw bit set. When finding the net_device associated with a port, the common code keys off this bit to use either the net_device found in the vnet_port or the net_device in the vnet structure (see the VNET_PORT_TO_NET_DEVICE() macro in sunvnet_common.h). This scheme allows the common code to work with both drivers with minimal changes. Similar to Xen, network interfaces created by the ldmvsw driver will always have a HW Addr (i.e. mac address) of FE:FF:FF:FF:FF:FF and each will be assigned the devname "vif<cfg_handle>.<port_id>" - where <cfg_handle> and <port_id> are a unique handle/port pair assigned to the associated vsw-port node in the MD. Signed-off-by: Aaron Young <aaron.young@oracle.com> Signed-off-by: Rashmi Narasimhan <rashmi.narasimhan@oracle.com> Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18	ldmvsw: Make sunvnet_common compatible with ldmvsw	Aaron Young	3	-70/+121
	Modify sunvnet common code and data structures to be compatible with both sunvnet and ldmvsw drivers. Details: Sunvnet operates on "vnet-port" nodes which appear in the Machine Description (MD) in a guest domain. Ldmvsw operates on "vsw-port" nodes which appear in the MD of a service domain. A difference between the sunvnet driver and the ldmvsw driver is the sunvnet driver creates a network interface (i.e. a struct net_device) for every vnet-port parent "network" node. Several vnet-ports may appear under this common parent network node - each corresponding to a common parent network interface. Conversely, since bridge/vswitch software will need to interface with every vsw-port in a system, the ldmvsw driver creates a network interface (i.e. a struct net_device) for every vsw-port - not every parent node as with sunvnet. This difference required some special handling in the common code as explained below. There are 2 key data structures used by the sunvnet and ldmvsw drivers (which are now found in sunvnet_common.h): 1. struct vnet_port This structure represents a vnet-port node in sunvnet and a vsw-port in the ldmvsw driver. 2. struct vnet This structure represents a parent "network" node in sunvnet and a parent "virtual-network-switch" node in ldmvsw. Since the sunvnet driver allocates a net_device for every parent "network" node, a net_device member appears in the struct vnet. Since the ldmvsw driver allocates a net_device for every port, a net_device member was added to the vnet_port. The common code distinguishes which structure net_device member to use by checking a 'vsw' bit that was added to the vnet_port structure. See the VNET_PORT_TO_NET_DEVICE() marco in sunvnet_common.h. The netdev_priv() in sunvnet is allocated as a vnet. The netdev_priv() in ldmvsw is a vnet_port. Therefore, any place in the common code where a netdev_priv() call was made, a wrapper function was implemented in each driver to first get the vnet and/or vnet_port (in a driver specific way) and pass them as newly added parameters to the common functions (see wrapper funcs: vnet_set_rx_mode() and vnet_poll_controller()). Since these wrapper functions call __tx_port_find(), __tx_port_find() was moved from the common code back into sunvnet.c. Note - ldmvsw.c does not require this function. These changes also required that port_is_up() be made into a common function and thus it was given a _common suffix and exported like the other common functions. A wrapper function was also added for vnet_start_xmit_common() to pass a driver-specific function arg to return the port associated with a given struct sk_buff and struct net_device. This was required because vnet_start_xmit_common() grabs a lock prior to getting the associated port. Using a function pointer arg allowed the code to work unchanged without risking changes to the non-trivial locking logic in vnet_start_xmit_common(). Signed-off-by: Aaron Young <aaron.young@oracle.com> Signed-off-by: Rashmi Narasimhan <rashmi.narasimhan@oracle.com> Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>